Updated: Feb 5
This is the topology that we'll be working with for this blog post:
Nothing too fancy - two edges, two co-located control-plane and border devices and a fusion that sits outside of the fabric for inter-VRF (inter-VN) communication.
A lot of customers have their RPs outside of the fabric and want to use one of two protocols that allow for discovery of RP without static assignment - Bootstrap Router (BSR) or Auto-RP (Cisco Proprietary).
The problem with BSR and a SDA fabric
Off the bat, I can tell you that BSR does not (and will not) work with a SDA fabric today - if you closely look at how we form PIM neighbors for the fabric (we don't, really), it is a static neighbor, enabled by appropriate code based on the RP and the RPF neighbor for it. We're not really exchanging PIM HELLOs in the overlay and thus, no active neighbor discovery. Don't believe me? Let's confirm:
The fusion sees the two borders as active PIM neighbors:
The borders only see the fusion and nothing else:
The edge (showing just Edge1 as an example here), as expected, sees no neighbor:
As I said earlier, a neighbor is injected into the PIM neighbor list based on who the RPF neighbor is for the RP. So, let's test that out - I'll add a static RP address on Edge1 now.
The RPF neighbor for this RP address is:
This neighbor now gets inserted into the PIM neighbor list:
This logic, however, breaks BSR functionality with the RP outside of the fabric. Why? Well, BSR uses a flood to all PIM neighbors mechanism (it floods to the multicast destination 188.8.131.52) to forward the candidate RP(s) information - assuming the fusion (in our case) sends this to the borders, which are forming active PIM neighbors with it, the borders, in turn, cannot send this to the edges because the edges never show up as PIM neighbors for it.
Here's a quote from the BSR RFC (https://tools.ietf.org/html/rfc5059#section-3.4) to confirm how BSR messages are forwarded:
When a Bootstrap message is forwarded, it is forwarded out of every multicast-capable interface that has PIM neighbors (including the one over which the message was received)
Auto-RP and a catch-22
Auto-RP, however, does work but there's a catch-22 going on here and I want to make sure everyone understands WHY it works - let's pull back the curtain, shall we?
Before we get into the fabric specific details, let's recall how Auto-RP works. Here's my one minute crash course - Auto-RP uses two multicast groups to distribute RP information, 184.108.40.206 and 220.127.116.11, and includes two new roles for its functionality - candidate RPs and Mapping Agents. The candidate RPs announce their RP address to the mapping agent(s) using 18.104.22.168 and the mapping agent elects one of these as the RP address and announces that to the network using 22.214.171.124.
So, if you want to learn of an RP using Auto-RP, you need to be subscribed to 126.96.36.199, which all Cisco multicast enabled devices do, by default. But, how do you build a multicast tree for a group that you don't know the RP for? Thus, for the Auto-RP groups, the expectation is to fall back to dense mode to flood these messages. There are several ways to do this, the simplest and easiest is to enable the 'autorp listener' feature using the CLI 'ip pim autorp listener' or for a specific VRF, using 'ip pim vrf <VRF> autorp listener'.
In our topology, I'm going to configure a loopback on the fusion, enable PIM SM on it and put it in the appropriate VRF.
Next, I'll advertise this loopback as the candidate RP. The same fusion will also be a mapping agent, just to keep things simple.
An important point to note - our devices are configured to be autorp listeners by default (not sure from what release for the IOS/IOS-XE platforms). You can confirm this with the following:
This is only seen with the 'all' argument against the running-config, which implies it is a default configuration present on the device. This is important because this allows the device to use PIM dense mode for 188.8.131.52 and 184.108.40.206, which enables the packet to be flooded on all interfaces (dense-mode forwarding rules apply).
The borders get this and are able to derive the RP information from it.
Let's look at the edge now (taking Edge1 as an example).
Edge1 has not received this information. This is because 220.127.116.11 is treated as any other ASM group in the fabric and therein lies the flaw and the catch-22. If you're using native multicast for the underlay (as an example), this implies that an underlay SSM tree should be built for the packets to be correctly delivered.
How would the underlay SSM tree built if the RP is not known for the group? We use the RP information, find out the RPF neighbor for the RP and build our underlay SSM tree against that neighbor by sending PIM (S,G) joins to it. As you can see, the mroute table for this group on Edge1 has a null incoming interface and the RP is not known.
Thus, if I manually add the RP on Edge1, I should now see Auto-RP packets as well.
Now, I see the mroute table built correctly as well:
The RPF neighbor points to Border1 and Edge1 (and Edge2) sends a (S,G) join in the underlay towards Border1, building this SSM tree. On Border1, we see the correct state, allowing for the auto-RP messages to flow to the edges.
A packet capture confirms that the Auto-RP messages are now flowing via the overlay, encapsulated with a VXLAN header:
There you have it - Auto-RP in SDA isn't really going to work unless you statically configure the RP as well (which defeats the purpose of Auto-RP, doesn't it?). I do feel that the implementation for BSR and Auto-RP groups should be changed - 18.104.22.168, 22.214.171.124 and 126.96.36.199 should be a "special" case and we should use L2 Flooding to flood these packets in the underlay ASM group, pre-built for L2 Flooding (which will get these messages to all edges).
I hope this was informative and as always, I'd like to thank you for reading, if you've come this far.