Cumulus Basics Part VIII - VXLAN symmetric routing and multi-tenancy

Updated: Jun 14, 2019

Now that we've configured and verified a working asymmetric VXLAN routing solution in the previous post, let's take a look at the greener side of the grass (well, it depends on where you stand) - symmetric IRB.

This post is going to introduce VRFs into the picture that pave the way for multi-tenancy in VXLAN solutions. We continue to use the same topology as our previous post:

Any configuration we added for asymmetric routing has been removed. We simply have our L2VNIs created on LEAF1 and LEAF3 and the BGP peerings are up.

Before we begin with the actual configuration, let's understand the logic behind symmetric IRB. This functions as a bridge-route-route-bridge model where the packet is bridged to your source VTEP from the host and is then routed into a new type of VNI called the L3VNI. This takes the packet through your fabric core to the destination VTEP, where it is routed from the L3VNI into the local L2VNI and then bridged across to the destination host.

In addition to the L3VNIs, we also introduce VRFs here. VRFs allow for multi-tenancy. Imagine this - instead of having a dedicated core infrastructure for every customer, you could have this core infrastructure common to multiple customers with the kind of segregation that VRFs can provide. The L3VNIs are tied to the customer VRFs directly and in that sense, the L3VNIs should be unique per VRF.

The configuration for symmetric IRB is not very complicated. Let's take LEAF1 as an example and start to configure this:

Now that we have some of these pieces configured, we need to start making sense of this and putting it all together.

Naturally, to segregate your customers, they need to be put in their respective tenant VRFs. This means that the customer VLAN (particularly, the first L3 hop, which is the corresponding SVI in this case) needs to be in the tenant VRF. Additionally, you add the corresponding VLAN for the L3VNI in the same tenant VRF.

Confirm that the customer facing SVI and the SVI corresponding to the L3VNI are up and in the correct VRF:

Both the L2VNI and the L3VNI is going to be assigned a RD/RT value. You can confirm this using 'net show bgp evpn vni":

Similar configuration is done on LEAF3:

Let's take a quick peek at the BGP EVPN table and see what is in there:

It appears that PC1s mac and IP address have already been learnt and installed in the BGP EVPN table. However, there is no information about PC3 in here (most likely because PC3 has not sent any traffic so far). So, let's take this opportunity to generate some traffic from PC3 and understand how the control-plane is built with L3VNIs.

We will enable some debugs to understand what is going on. As explained in the previous post, you must enable logging of syslogs at the debugging level and then enable the debugs. The following debugs were enabled:

The same debugs were enabled on LEAF1 as well so as to capture simultaneous debugging information from both.

We generate some traffic from PC3 now (by pinging its default gateway, which is SVI30 on LEAF3). Remember, all debugs are redirected to /var/log/frr/frr.log. Snippets of relevant debug logs are below:

As you can see, this gets learnt in the BGP EVPN table on LEAF1 and pushed to RIB/FIB as well (for simplicity sake, I have shut down all links to SPINE2):

The BGP update itself looks like this:

The control-plane exchange can be visualized as following:


The ICMP request from PC1 hits LEAF1. Since the destination mac address is owned by LEAF1, it strips off the Layer2 header and does a lookup against the destination IP address in the IP header.

This gets encapsulated with the appropriate headers and the outer destination IP address is set to The VNI inside the VXLAN header is set to 10040 which is what VLAN 40 is associated to.

The data-plane packet, post encapsulation:

As you can see, the source and destination IP addresses in the outer header belong to the source VTEP and the destination VTEP respectively. A UDP header follows, wherein the destination port signifies the following header as VXLAN. Inside the VXLAN header, the VNI is set to 10040, which is the L3VNI shared between VTEPs, uniquely identifying the VRF.

This encapsulated packet is sent towards the SPINE1/SPINE2.

Assuming it hits SPINE1, a traditional route lookup is done against the destination IP address in the outer header (which is in this case). This is advertised in the underlay and SPINE1 knows that the next-hop for this is LEAF3.

SPINE1 now forwards this to LEAF3 (remember, the packet is still encapsulated and is simply being routed in the underlay till it reaches the destination VTEP):

The packet reaches LEAF3 now. The destination mac address in the outer Ethernet header is owned by it so this gets stripped. The destination IP address in the outer IP header is also owned by it, so this gets stripped as well.

LEAF3 parses the UDP header and understands that a VXLAN header follows. It then parses the VXLAN header - the most relevant information here is the embedded VNI.

Why is this VNI (the L3VNI) so important? LEAF3 (the destination VTEP) uses this VNI to determine which VRF table should be consulted to do the inner destination IP address lookup in. This is how the separation of customers is achieved end to end.

LEAF3 can now look at the VRF table for TENANT1 to determine how to get to

Remember, the EVPN arp-cache table should also be populated with information about

The packet is now forwarded to PC3 and it can respond back. The same process occurs in the reverse direction as well.

For any queries, concerns or conversations, please email