Cumulus Part IX - Understanding VXLAN EVPN Route-Target control
In this post, we look at how route-targets extended communities can be used to control VXLAN BGP EVPN routes in Cumulus Linux.
Introduction
This post assumes that the reader has a general understanding of L2/L3 VNIs and asymmetric/symmetric IRB.
Cumulus, by default, uses auto RTs for L2 and L3 VNIs. This makes for a very easy experience (almost plug and play like) when building VXLAN BGP EVPN fabrics. But it also doesn’t help you understand much of how route-targets are being imported across and how to completely control this.
It’s always good to learn to drive a stick, before moving to an automatic. So, the goal of this blog is to help understand how L2/L3 VNI RTs are exported/imported and how you can control what goes into your customer VRF. This will also allow you to control whether you want asymmetric or symmetric IRB (see my previous Cumulus blog posts to understand what is asymmetric and symmetric IRB).
Topology
For this post, we’re going to use the following topology and incrementally add/delete/modify certain aspects of this network.
RTs in an L2VNI environment
To begin with, this is a pure L2 VNI setup with two servers (Server1 and Server2) deployed in VLAN 10, mapped to VNI 10010. The network is a BGP unnumbered core, with loopbacks of each leaf acting as the VXLAN tunnel IPs.
Let’s review the BGP configuration:
router bgp 64521
bgp router-id 1.1.1.1
neighbor swp1 interface remote-as external
neighbor swp2 interface remote-as external
address-family ipv4 unicast
network 1.1.1.1/32
address-family l2vpn evpn
neighbor swp1 activate
neighbor swp2 activate
advertise-all-vni
Typical BGP configuration - we’re advertising all VNIs into EVPN and the BGP L2VPN EVPN peering is activated against both Spine1 and Spine2. By default, Cumulus Linux (FRR, really) uses a model of ASN:VNI to derive the VNI RTs.
In our case, this will be 64521:10010 for VNI 10010. We can confirm using the following:
cumulus@Leaf1:mgmt:~$ net show bgp l2vpn evpn vni 10010
VNI: 10010 (known to the kernel)
Type: L2
Tenant-Vrf: default
RD: 1.1.1.1:2
Originator IP: 1.1.1.1
Mcast group: 0.0.0.0
Advertise-gw-macip : Disabled
Advertise-svi-macip : Disabled
Import Route Target:
64521:10010
Export Route Target:
64521:10010
PC1s mac address is advertised as a type-2 route using this export RT:
cumulus@Leaf1:mgmt:~$ net show bgp l2vpn evpn route rd 1.1.1.1:2 mac 00:50:79:66:68:06
BGP routing table entry for 1.1.1.1:2:[2]:[00:50:79:66:68:06]/352
Paths: (1 available, best #1)
Advertised to non peer-group peers:
Spine1(swp1) Spine2(swp2)
Route [2]:[0]:[48]:[00:50:79:66:68:06] VNI 10010
Local
1.1.1.1 from 0.0.0.0 (1.1.1.1)
Origin IGP, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (First path received)
Extended Community: ET:8 RT:64521:10010
Last update: Sat Jul 24 16:30:57 2021
This is correctly imported on Leaf2. Cumulus does not show the route imported into the local RD in the BGP table, however, bgpd informs zebra and zebra has installed it in the MAC address table.
cumulus@Leaf2:mgmt:~$ net show bgp l2vpn evpn route rd 1.1.1.1:2 mac 00:50:79:66:68:06
BGP routing table entry for 1.1.1.1:2:[2]:[00:50:79:66:68:06]/352
Paths: (2 available, best #1)
Advertised to non peer-group peers:
Spine1(swp1) Spine2(swp2)
Route [2]:[0]:[48]:[00:50:79:66:68:06] VNI 10010
65550 64521
1.1.1.1 from Spine1(swp1) (11.11.11.11)
Origin IGP, valid, external, bestpath-from-AS 65550, best (Router ID)
Extended Community: RT:64521:10010 ET:8
Last update: Sat Jul 24 16:30:59 2021
Route [2]:[0]:[48]:[00:50:79:66:68:06] VNI 10010
65550 64521
1.1.1.1 from Spine2(swp2) (22.22.22.22)
Origin IGP, valid, external
Extended Community: RT:64521:10010 ET:8
Last update: Sat Jul 24 16:30:59 2021
The MAC address table also shows this entry, against a remote VTEP of 1.1.1.1, which is Leaf1.
cumulus@Leaf2:mgmt:~$ net show bridge macs 00:50:79:66:68:06
VLAN Master Interface MAC TunnelDest State Flags LastSeen
-------- ------ --------- ----------------- ---------- ----- ------------------ --------
10 bridge vni10 00:50:79:66:68:06 extern_learn 00:00:09
untagged vni10 00:50:79:66:68:06 1.1.1.1 self, extern_learn 00:00:09
Adding manual RTs
Let’s add a manual export RT for VNI 10010, on Leaf1:
cumulus@Leaf1:mgmt:~$ net add bgp l2vpn evpn vni 10010 route-target export 1:10010
cumulus@Leaf1:mgmt:~$ net commit
This is correctly added to the prefix that is being advertised via BGP EVPN:
cumulus@Leaf1:mgmt:~$ net show bgp l2vpn evpn route type 2
BGP table version is 9, local router ID is 1.1.1.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [1]:[ESI]:[EthTag]:[IPlen]:[VTEP-IP]:[Frag-id]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]
Network Next Hop Metric LocPrf Weight Path
Extended Community
Route Distinguisher: 1.1.1.1:2
*> [2]:[0]:[48]:[00:50:79:66:68:06]
1.1.1.1 32768 i
ET:8 RT:1:10010
Displayed 1 prefixes (1 paths) (of requested type)
Leaf2 is still importing this though - why, and how?
// BGP EVPN table
cumulus@Leaf2:mgmt:~$ net show bgp l2vpn evpn route type 2
BGP table version is 9, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [1]:[ESI]:[EthTag]:[IPlen]:[VTEP-IP]:[Frag-id]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]
Network Next Hop Metric LocPrf Weight Path
Extended Community
Route Distinguisher: 1.1.1.1:2
*> [2]:[0]:[48]:[00:50:79:66:68:06]
1.1.1.1 0 65550 64521 i
RT:1:10010 ET:8
* [2]:[0]:[48]:[00:50:79:66:68:06]
1.1.1.1 0 65550 64521 i
RT:1:10010 ET:8
Displayed 1 prefixes (2 paths) (of requested type)
// MAC address table
cumulus@Leaf2:mgmt:~$ net show bridge macs 00:50:79:66:68:06
VLAN Master Interface MAC TunnelDest State Flags LastSeen
-------- ------ --------- ----------------- ---------- ----- ------------------ --------
10 bridge vni10 00:50:79:66:68:06 extern_learn 00:03:14
untagged vni10 00:50:79:66:68:06 1.1.1.1 self, extern_learn 00:03:14
This is the first important thing to remember with auto-derived RTs on Cumulus Linux - there is an implicit *:VNI import when using auto-RTs. This is necessary because when you follow an eBGP peering model, the AS numbers will naturally be different and a ASN:VNI import model will not work when using your own ASN for the import RT.
Let’s add a manual, incorrect RT now on Leaf2:
cumulus@Leaf2:mgmt:~$ net add bgp l2vpn evpn vni 10010 route-target import 1:10
cumulus@Leaf2:mgmt:~$ net commit
We no longer see the entry in the MAC address table anymore, even though BGP EVPN has received it:
cumulus@Leaf2:mgmt:~$ net show bridge macs 00:50:79:66:68:06
VLAN Master Interface MAC TunnelDest State Flags LastSeen
---- ------ --------- --- ---------- ----- ----- --------
* no output *
cumulus@Leaf2:mgmt:~$ net show bgp l2vpn evpn route type 2
BGP table version is 11, local router ID is 2.2.2.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [1]:[ESI]:[EthTag]:[IPlen]:[VTEP-IP]:[Frag-id]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]
Network Next Hop Metric LocPrf Weight Path
Extended Community
Route Distinguisher: 1.1.1.1:2
*> [2]:[0]:[48]:[00:50:79:66:68:06]
1.1.1.1 0 65550 64521 i
RT:1:10010 ET:8
* [2]:[0]:[48]:[00:50:79:66:68:06]
1.1.1.1 0 65550 64521 i
RT:1:10010 ET:8
Displayed 1 prefixes (2 paths) (of requested type)
Once we add the correct RT to be imported, we see it in the mac table again:
cumulus@Leaf2:mgmt:~$ net add bgp l2vpn evpn vni 10010 route-target import 1:10010
cumulus@Leaf2:mgmt:~$ net commit
cumulus@Leaf2:mgmt:~$ net show bridge macs 00:50:79:66:68:06
VLAN Master Interface MAC TunnelDest State Flags LastSeen
-------- ------ --------- ----------------- ---------- ----- ------------------ --------
10 bridge vni10 00:50:79:66:68:06 extern_learn 00:00:17
untagged vni10 00:50:79:66:68:06 1.1.1.1 self, extern_learn 00:00:17
Remember, this import RT also controls what is inserted into the EVPN ARP cache. Assuming corresponding SVIs were deployed as well, you should see the EVPN ARP cache populated with this entry if the correct import RTs are configured (via the type-2 MAC plus IP route).
cumulus@Leaf2:mgmt:~$ net show evpn arp-cache vni 10010
Number of ARPs (local and remote) known for this VNI: 2
Flags: I=local-inactive, P=peer-active, X=peer-proxy
Neighbor Type Flags State MAC Remote ES/VTEP Seq #'s
10.10.10.101 remote active 00:50:79:66:68:06 1.1.1.1 0/0
10.10.10.102 local active 00:50:79:66:68:07 0/0
RTs with symmetric IRB
Let’s now convert our network to a symmetric IRB topology. Server2 is moved to VLAN 20, with an IP address of 20.20.20.102. VLANs 10 and 20 are present on both Leaf1 and Leaf2, acting as anycast gateways for their respective subnets.
Both Leaf1 and Leaf2 have VNIs created for VLANs 10 and 20. Example below from Leaf1:
interface vni10
bridge-access 10
mstpctl-bpduguard yes
mstpctl-portbpdufilter yes
vxlan-id 10010
vxlan-local-tunnelip 1.1.1.1
interface vni20
bridge-access 20
mstpctl-bpduguard yes
mstpctl-portbpdufilter yes
vxlan-id 10020
vxlan-local-tunnelip 1.1.1.1
A L3VNI (VNI 10040) is created for symmetric routing and mapped to VLAN 40. Each of the servers are moved into a new VRF, called VRF1. The L3VNI is mapped to this VRF as well.
interface vlan10
address 10.10.10.1/24
hwaddress 00:10:00:10:00:10
vlan-id 10
vlan-raw-device bridge
vrf VRF1
interface vlan20
address 20.20.20.1/24
hwaddress 00:20:00:20:00:20
vlan-id 20
vlan-raw-device bridge
vrf VRF1
interface vlan40
vlan-id 40
vlan-raw-device bridge
vrf VRF1
The same import RT logic applies for L3VNIs also - if there’s no manual import configured for the L3VNI, then the default *:VNI import is applied. It is crucial to understand how the import/export RTs for the L3VNI is controlled - this is done under the VRF specific address-family in BGP.
For example, before setting any manual import/export RTs, the L3VNI has auto-derived it:
cumulus@Leaf1:mgmt:~$ net show bgp l2vpn evpn vni 10040
VNI: 10040 (known to the kernel)
Type: L3
Tenant VRF: VRF1
RD: 20.20.20.1:3
Originator IP: 1.1.1.1
Advertise-gw-macip : n/a
Advertise-svi-macip : n/a
Advertise-pip: Yes
System-IP: 1.1.1.1
System-MAC: 50:00:00:03:00:03
Router-MAC: 50:00:00:03:00:03
Import Route Target:
64521:10040
Export Route Target:
64521:10040
We’ll now configure this manually instead, as an example, on Leaf1. This goes under the BGP configuration itself:
net add bgp vrf VRF1 autonomous-system 64521
net add bgp vrf VRF1 l2vpn evpn route-target import 2:10040
net add bgp vrf VRF1 l2vpn evpn route-target export 1:10040
Notice how these are specific to the VRF. Now, Leaf1 should be adding a RT of 1:10040 to the prefixes. The final BGP configuration in this case:
router bgp 64521
bgp router-id 1.1.1.1
neighbor swp1 interface remote-as external
neighbor swp2 interface remote-as external
address-family ipv4 unicast
network 1.1.1.1/32
address-family l2vpn evpn
neighbor swp1 activate
neighbor swp2 activate
advertise-all-vni
vni 10020
route-target import 2:10
route-target export 2:10
vni 10010
route-target import 1:10
route-target export 1:10
router bgp 64521 vrf VRF1
address-family l2vpn evpn
route-target export 1:10040
route-target import 2:10040
Looking at the BGP EVPN table, we can see the RTs are correctly added:
cumulus@Leaf1:mgmt:~$ net show bgp l2vpn evpn route rd 1.1.1.1:3 type 2
EVPN type-1 prefix: [1]:[ESI]:[EthTag]:[IPlen]:[VTEP-IP]:[Frag-id]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]
BGP routing table entry for 1.1.1.1:3:[2]:[00:50:79:66:68:06]/352
Paths: (1 available, best #1)
Advertised to non peer-group peers:
Spine1(swp1) Spine2(swp2)
Route [2]:[0]:[48]:[00:50:79:66:68:06] VNI 10010/10040
Local
1.1.1.1 from 0.0.0.0 (1.1.1.1)
Origin IGP, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (First path received)
Extended Community: ET:8 RT:1:10 RT:1:10040 Rmac:50:00:00:03:00:03
Last update: Fri Jul 30 09:43:56 2021
BGP routing table entry for 1.1.1.1:3:[2]:[00:50:79:66:68:06]:[10.10.10.101]/352
Paths: (1 available, best #1)
Advertised to non peer-group peers:
Spine1(swp1) Spine2(swp2)
Route [2]:[0]:[48]:[00:50:79:66:68:06]:[32]:[10.10.10.101] VNI 10010/10040
Local
1.1.1.1 from 0.0.0.0 (1.1.1.1)
Origin IGP, weight 32768, valid, sourced, local, bestpath-from-AS Local, best (First path received)
Extended Community: ET:8 RT:1:10 RT:1:10040 Rmac:50:00:00:03:00:03
Last update: Fri Jul 30 09:43:56 2021
Displayed 2 prefixes (2 paths) with this RD (of requested type)
There are two distinct RTs added here - one for the corresponding L2VNI and another for the L3VNI.
It is important to understand the impact of importing each RT - on Leaf2, importing the RT for the L3VNI is what imports the /32 route (from the type-2 MAC plus IP route) into the VRF routing table, while importing the L2VNI will pull the MAC address into the MAC address table (this is done using the type-2 MAC only route) and create an entry in the EVPN ARP cache (using the type-2 MAC plus IP route).
Let’s confirm on Leaf2:
// MAC address table
cumulus@Leaf2:mgmt:~$ net show bridge macs 00:50:79:66:68:06
VLAN Master Interface MAC TunnelDest State Flags LastSeen
-------- ------ --------- ----------------- ---------- ----- ------------------ --------
10 bridge vni10 00:50:79:66:68:06 extern_learn 00:00:43
untagged vni10 00:50:79:66:68:06 1.1.1.1 self, extern_learn 00:00:43
// EVPN ARP cache
cumulus@Leaf2:mgmt:~$ net show evpn arp-cache vni 10010
Number of ARPs (local and remote) known for this VNI: 1
Flags: I=local-inactive, P=peer-active, X=peer-proxy
Neighbor Type Flags State MAC Remote ES/VTEP Seq #'s
10.10.10.101 remote active 00:50:79:66:68:06 1.1.1.1 0/0
// VRF1 route table
cumulus@Leaf2:mgmt:~$ net show route vrf VRF1 ipv4
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
VRF VRF1:
K>* 0.0.0.0/0 [255/8192] unreachable (ICMP unreachable), 01:46:41
C>* 10.10.10.0/24 is directly connected, vlan10, 01:46:41
B>* 10.10.10.101/32 [20/0] via 1.1.1.1, vlan40 onlink, weight 1, 00:21:36
C>* 20.20.20.0/24 is directly connected, vlan20, 01:46:41
On Leaf2, as you can see, we have enough information to route both asymmetrically and symmetrically. Remember, the lookup is always a longest prefix match - which means the /32 route is hit and the packet is routed symmetrically.
Controlling RTs
Knowing what we know of RT import/export, we can fully control how we want our traffic to flow (asymmetric or symmetric).
On Leaf1, let’s import a different RT under the VRF address-family.
router bgp 64521
bgp router-id 1.1.1.1
neighbor swp1 interface remote-as external
neighbor swp2 interface remote-as external
address-family ipv4 unicast
network 1.1.1.1/32
address-family l2vpn evpn
neighbor swp1 activate
neighbor swp2 activate
advertise-all-vni
vni 10020
route-target import 2:10
route-target export 2:10
vni 10010
route-target import 1:10
route-target export 1:10
router bgp 64521 vrf VRF1
address-family l2vpn evpn
route-target export 1:10040
route-target import 2:10041
This causes the type-2 MAC plus IP route to not be imported as a /32 route in the VRF table. Now, the longest prefix match is the subnet route itself:
cumulus@Leaf1:mgmt:~$ net show route vrf VRF1 ipv4
Codes: K - kernel route, C - connected, S - static, R - RIP,
O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
F - PBR, f - OpenFabric,
> - selected route, * - FIB route, q - queued, r - rejected, b - backup
t - trapped, o - offload failure
VRF VRF1:
K>* 0.0.0.0/0 [255/8192] unreachable (ICMP unreachable), 07:27:56
C>* 10.10.10.0/24 is directly connected, vlan10, 07:27:56
C>* 20.20.20.0/24 is directly connected, vlan20, 07:27:56
This means that the destination is directly connected to Leaf1 and it can ARP for it. Using the EVPN ARP cache, Leaf1 already knows PC2s mac address, and there’s no need to ARP for it again. Thus, PC1 to PC2, traffic will flow asymmetrically. A packet capture confirms that the VNI added to the VXLAN header is 10020.
The return path is symmetric because Leaf2 still has that /32 entry imported into the VRF table. A packet capture confirms that the return packet has the L3VNI (10040) added to the VXLAN header.
I hope this was informative, and I’ll see you in the next one.