Updated: Jul 31, 2021
[this post assumes that the reader has a general understanding of L2/L3 VNIs and asymmetric/symmetric IRB]
Cumulus, by default, uses auto RTs for L2 and L3 VNIs. This makes for a very easy experience (almost plug and play like) when building VXLAN BGP EVPN fabrics. But it also doesn't help you understand much of how route-targets are being imported across and how to completely control this.
It's always good to learn to drive a stick, before moving to an automatic. So, the goal of this blog is to help understand how L2/L3 VNI RTs are exported/imported and how you can control what goes into your customer VRF. This will also allow you to control whether you want asymmetric or symmetric IRB (see my previous Cumulus blog posts to understand what is asymmetric and symmetric IRB).
For this post, we're going to use the following topology and incrementally add/delete/modify certain aspects of this network.
To begin with, this is a pure L2 VNI setup with two servers (Server1 and Server2) deployed in VLAN 10, mapped to VNI 10010. The network is a BGP unnumbered core, with loopbacks of each leaf acting as the VXLAN tunnel IPs.
Let's review the BGP configuration:
Typical BGP configuration - we're advertising all VNIs into EVPN and the BGP L2VPN EVPN peering is activated against both Spine1 and Spine2. By default, Cumulus Linux (FRR, really) uses a model of ASN:VNI to derive the VNI RTs.
In our case, this will be 64521:10010 for VNI 10010. We can confirm using the following:
PC1s mac address is advertised as a type-2 route using this export RT:
This is correctly imported on Leaf2. Cumulus does not show the route imported into the local RD in the BGP table, however, bgpd informs zebra and zebra has installed it in the MAC address table.
The MAC address table also shows this entry, against a remote VTEP of 188.8.131.52, which is Leaf1.
Let's add a manual export RT for VNI 10010, on Leaf1:
This is correctly added to the prefix that is being advertised via BGP EVPN:
Leaf2 is still importing this though - why, and how?
This is the first important thing to remember with auto-derived RTs on Cumulus Linux - there is an implicit *:VNI import when using auto-RTs. This is necessary because when you follow an eBGP peering model, the AS numbers will naturally be different and a ASN:VNI import model will not work when using your own ASN for the import RT.
Let's add a manual, incorrect RT now on Leaf2:
We no longer see the entry in the MAC address table anymore, even though BGP EVPN has received it:
Once we add the correct RT to be imported, we see it in the mac table again:
Remember, this import RT also controls what is inserted into the EVPN ARP cache. Assuming corresponding SVIs were deployed as well, you should see the EVPN ARP cache populated with this entry if the correct import RTs are configured (via the type-2 MAC plus IP route).
Let's now convert our network to a symmetric IRB topology. Server2 is moved to VLAN 20, with an IP address of 184.108.40.206. VLANs 10 and 20 are present on both Leaf1 and Leaf2, acting as anycast gateways for their respective subnets.
Both Leaf1 and Leaf2 have VNIs created for VLANs 10 and 20. Example below from Leaf1:
A L3VNI (VNI 10040) is created for symmetric routing and mapped to VLAN 40. Each of the servers are moved into a new VRF, called VRF1. The L3VNI is mapped to this VRF as well.
The same import RT logic applies for L3VNIs also - if there's no manual import configured for the L3VNI, then the default *:VNI import is applied. It is crucial to understand how the import/export RTs for the L3VNI is controlled - this is done under the VRF specific address-family in BGP.
For example, before setting any manual import/export RTs, the L3VNI has auto-derived it:
We'll now configure this manually instead, as an example, on Leaf1. This goes under the BGP configuration itself:
Notice how these are specific to the VRF. Now, Leaf1 should be adding a RT of 1:10040 to the prefixes. The final BGP configuration in this case:
Looking at the BGP EVPN table, we can see the RTs are correctly added:
There are two distinct RTs added here - one for the corresponding L2VNI and another for the L3VNI.
It is important to understand the impact of importing each RT - on Leaf2, importing the RT for the L3VNI is what imports the /32 route (from the type-2 MAC plus IP route) into the VRF routing table, while importing the L2VNI will pull the MAC address into the MAC address table (this is done using the type-2 MAC only route) and create an entry in the EVPN ARP cache (using the type-2 MAC plus IP route).
Let's confirm on Leaf2:
On Leaf2, as you can see, we have enough information to route both asymmetrically and symmetrically. Remember, the lookup is always a longest prefix match - which means the /32 route is hit and the packet is routed symmetrically.
Knowing what we know of RT import/export, we can fully control how we want our traffic to flow (asymmetric or symmetric).
On Leaf1, let's import a different RT under the VRF address-family.
This causes the type-2 MAC plus IP route to not be imported as a /32 route in the VRF table. Now, the longest prefix match is the subnet route itself:
This means that the destination is directly connected to Leaf1 and it can ARP for it. Using the EVPN ARP cache, Leaf1 already knows PC2s mac address, and there's no need to ARP for it again. Thus, PC1 to PC2, traffic will flow asymmetrically. A packet capture confirms that the VNI added to the VXLAN header is 10020.
The return path is symmetric because Leaf2 still has that /32 entry imported into the VRF table. A packet capture confirms that the return packet has the L3VNI (10040) added to the VXLAN header.
I hope this was informative, and I'll see you in the next one.