Fair warning - this is going to be a long, long post. Get yourself some coffee because you're going to be here for a while!
We're going to continue working with the following topology for this post, with a legacy network added to the existing infrastructure:
This is a fairly common scenario that you might run into with customer's migrating to SD-Access. The premise is this - say you have subnet X that you're going to be migrating into SD-Access but it cannot be done in one shot or one window. The requirement is to have this subnet co-exist in the fabric as well as outside the fabric, within a legacy portion of the network.
From the perspective of our topology, we have the 220.127.116.11/24 subnet that needs to extend between the legacy network and the SD-Access fabric. For these kind of situations, a feature called L2 handoff was introduced. This post will walk you through the provisioning of L2 handoff via DNAC and help you understand what kind of configuration is pushed to make it work. The biggest question that I always try to answer will also be addressed - HOW does it really work?
The L2 handoff configuration flow is quite similar to L3 handoff. Under 'Provision', go to the 'Fabric' tab. Under 'Fabric Infrastructure', you should see your fabric:
At this point in time, only one border can be used to do L2 handoff. In my case, I'd like to use Border1 for this purpose, so let's click on Border1. Once you do this, you should see the various details about Border1:
Within the "Fabric" tab here, click "Configure":
In this page, you should see two major options - Layer 3 Handoff and Layer 2 Handoff. For the purposes of this post, we are interested in L2 handoff so that's where our focus will be. As you can see, all VNs that have an IP pool assigned to them will be listed here.
In our case, we are interested in extended an IP pool within the Guest_VN, so that's what we are going to select. Click on the name of the VN itself here. Once you do this, the following page will be displayed:
This is where some input needs to be given - you need to provide the external link that connects to the legacy network (this is a drop down menu and lists all the interfaces on the box - you simply need to choose the relevant interface). Additionally, you need to provide the external VLAN number - this is the legacy VLAN number that is in use in the legacy network for this subnet.
In my case, the interface that connects to the legacy switch is Gi1/0/12 and the legacy VLAN is 666. Let's input this into the GUI:
Click on 'Save' now and you should go one page back. The VN that you chose to do L2 handoff for should now have its checkbox checked. After this, click on 'Add' in the bottom right corner and DNAC should start pushing the relevant configuration to Border1.
So, what we've seen so far is the "magic" portion of it. The point and click. But what we really want to understand is what happens behind the scenes, don't we?
Here's what is pushed on Border1 in terms of L2 handoff configuration:
Okay, we're getting closer to understanding what's really happening behind the scenes but we aren't there yet. What is the significance of all of this configuration? Let's look at some of the more important pieces of configuration.
The SVI creation on the border, along with the LISP configuration
This is done in conjunction with removing the SVI on the legacy switch (presumably, this legacy switch was acting like the core of your network and was the first L3 hop and the default gateway for the hosts). The intention is to have Border1 as the default gateway for the legacy hosts now.
Along with this, the LISP configuration allows for the IP addresses of the legacy hosts to be picked up natively by LISP (via the LISP dynamic EID configuration - if you need to understand how dynamic EIDs work, please look at https://www.theasciiconstruct.com/post/cisco-sda-part-iv-lisp-mobility-dynamic-eids). This allows for the LISP database to be populated, from where the prefixes are pushed into RIB/FIB as directly connected routes.
Let's confirm all of this for Host3, which is a host in the legacy network.
As you can see, the LISP database has picked up this prefix as a /32 entry and the RLOC is Border1 itself. This is also marked as reachable, which implies it can now be registered to the control-plane (again, which is also Border1). Confirm that we see an entry in the site table of Border1:
The site table for this VRF has this entry. Perfect! The last thing to confirm is the RIB/FIB, as this is what will eventually be used in the forwarding plane:
This looks good too. There's a direct adjacency off of VLAN666 and the next hop is the host IP address itself. Border1 can simply ARP for the host directly and once resolved, it can forward the traffic to it.
As you can see, the host is reachable via the L2 handoff link.
Now that we've established how a legacy host should be learnt via L2 handoff, let's take a look at the fabric hosts. In our topology, we have two fabric hosts - Host1 with an IP address of 18.104.22.168 and and Host2 with an IP address of 22.214.171.124. We're not going to go into too much detail of how LISP registers these; what I want to focus on is what's really happening on Border1 with these hosts.
Once LISP has registered these /32 IP addresses of these hosts to the map-server (Border1/Border2), you should see them in the site table:
Good - we certainly see those hosts in there. Now, from the site table, a particular LISP configuration pulls these host entries into the RIB, with an AD of 250 and installs them against Null0.
Look at the RIB now and confirm that these entries are present against Null0:
Perfect! So, now, we have our fabric hosts in the RIB as well. Why are these prefixes against Null0 though? We'll understand this once we take a look at the packet walks.
Packet walk for North to South traffic
Okay, this is going to be two fold. We're going to take a look at a host beyond the Fusion trying to reach a host within the fabric and as well as a host in the legacy network. Let's use the DNAC as a source host in this case.
Case #1: DNAC to fabric hosts (Host1/Host2) reachability
From DNAC, we're going to do a simple ping to 126.96.36.199 and 188.8.131.52. The packet gets routed through the infrastructure that is above the borders and it eventually reaches Border1.
On Border1, a forwarding lookup is done and it matches the Null0 entry for these prefixes.
This is why Null0 is important - one of the rules for triggering the LISP process is that the prefix should point to Null0. So, this implies that when this entry is hit, the LISP process is now invoked.
LISP now tries to locate this EID. It goes into its map-cache to determine if there is an entry for this or not. Remember Border1s LISP configuration? It had a dynamic EID created for 184.108.40.206/24 - this also creates a corresponding entry in the map-cache table with an action of 'send-map-request'.
Border1 can now query the map-resolver (it'll just query itself since it is a map-resolver also) and determine where this EID is located.
We can see that we should hit the following entry in the site table:
Once this map-request/map-reply process is over, the map-cache should be appropriately built with this entry and CEF should be overwritten as well, with Edge1 as the next hop for this host.
Our final confirmation is the CEF entry:
The packet is now encapsulated and sent to Edge1:
Inline, this packet looks like this:
DNAC to legacy host
Let's now take a look at how a packet reaches the legacy host (Host3). This is very straightforward. Again, the packet gets routed towards the borders and reaches Border1.
Border1 does a lookup in its forwarding table:
This is a directly connected host so the ARP/CAM table will lead the packet out of the L2 handoff link. The following EPC capture on the legacy switch proves that the packet is a native IP packet, with only a 802.1Q header with a VLAN ID of 666:
This can be visualized like so:
Packet walk for East to West traffic
For this, we're going to take a look at a packet walk between Host1 (a fabric host) and Host3 (a legacy host). Since Host1 and Host3 are in the same subnet, when Host1 pings Host3, it will ARP for Host3 directly.
ARP is a funny little thing within SDA so I'm not going to go into too much detail. Essentially, when Edge1 gets the ARP for Host3, it will query the map-server for Host3s IP address and learn its mac address. It then queries the mac address and the map-resolver will return Border1 as the RLOC in this case.
The ARP is now encapsulated and sent to Border1. Visually, this should look like:
This is a unicast encapsulation; the destination IP address in the outer header is unicast - let's confirm that via a packet capture. The following packet capture is taken inbound on Border1 as the packet comes from Edge1:
Remember, the inner ARP is still a broadcast. The packet is decapsulated by Border1 and this inner ARP is flooded in VLAN 666 (which maps to the VNID we see in the VXLAN header - 8196).
This flooded ARP goes out the L2 handoff link and reaches Host3 via regular broadcast flood:
Similar process happens in reverse - Host3 replies to the ARP, which eventually gets encapsulated by Border1 and sent to Edge1, where it is decapsulated and sent to Host1.
Host1 now has Host3s IP address resolved to its mac address so it can generate an ICMP echo and send it to Host3. This gets encapsulated and sent to Border1, since it is the RLOC for the legacy host.
On the Border1, the forwarding table will say that this is directly connected:
The packet is decap'd and sent out natively towards Host3. A similar process happens in the reverse direction - the native packet reaches Border1. Because the destination mac address is of Border1 itself, a routing lookup is done which results in either the Null0 entry for 220.127.116.11 or a LISP resolved entry:
Visually, the entire forwarding path is like so in the direction of Host1->Host3:
And like so in the direction of Host3->Host1: