Post

EIGRP FRR

EIGRP FRR

Theory

EIGRP Loop-Free Alternate (LFA) IP Fast Reroute (IP FRR) is used to reduce the transition time from a successor route to a feasible successor route from regular convergence mechanisms. In a nutshell, DUAL precomputes backup routes and installs these in the Routing Information Base (RIB). Once a prefix’s primary path fails, traffic will quickly traverse the feasible successor or, in this case, a loop-free alternate path.

Under normal conditionals, path convergence may take hundreds of miliseconds. For example, if a link goes down, only devices directly connected are immediately aware. Downstream nodes are not aware and thus, packets will be blackholed. Repair paths are used from the time the link fails to the time the control plane reconverges and a new successor is installed.

LFA Computation

EIGRP computes prefix-based LFAs. This means that backup paths are computed per prefix (i.e. individually) rather than on per-link basis where every prefix that uses the same link share the same backup information. Per-link computation is unscalable given that all traffic will be directed to another LFA which could cause congestion. Per-prefix computation, on the other hand, can provide load-sharing and next-hop protection.

LFA Tie-Breakers

If there are several LFAs to choose from, EIGRP uses four attributes as tie-breakers:

  • Interface-Disjoint: LFAs that use the same outgoing interface as the successor route (primary) route are eliminated. We do not want the LFA to be dependent on the same interface.

  • Linecard-Disjoint: LFAs that share the same line card as the successor route are eliminated.

  • Lowest-Repair-Path-Metric: LFAs whose metric is higher than the other LFA candidates are eliminated.

  • Shared Risk Link Group (SRLG): SRLG is a tricky concept. According to Cisco’s documentation, SRLGs refer to links in a network sharing a common fiber (or a common physical attribute).

Topology

EIGRP FRR Topology

This is a pretty simple topology. I want to test convergence times with and without EIGRP FRR. To have R4 only install one successor and not load-balance, I adjusted the bandwidth on R4’s Gi0/0 interface to 100,000 Kbps (100 Mbps)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
R4#sh ip route eigrp | i 1.1.1.1
D        1.1.1.1 [90/131072] via 192.168.24.2, 00:01:53, GigabitEthernet0/1
R4#sh ip eigrp top 1.1.1.1/32
EIGRP-IPv4 Topology Entry for AS(100)/ID(4.4.4.4) for 1.1.1.1/32
  State is Passive, Query origin flag is 1, 1 Successor(s), FD is 131072
  Descriptor Blocks:
  192.168.24.2 (GigabitEthernet0/1), from 192.168.24.2, Send flag is 0x0
      Composite metric is (131072/130816), route is Internal
      Vector metric:
        Minimum bandwidth is 1000000 Kbit
        Total delay is 5020 microseconds
        Reliability is 255/255
        Load is 1/255
        Minimum MTU is 1500
        Hop count is 2
        Originating router is 192.168.13.1
  192.168.34.3 (GigabitEthernet0/0), from 192.168.34.3, Send flag is 0x0
      Composite metric is (154112/130816), route is Internal
      Vector metric:
        Minimum bandwidth is 100000 Kbit
        Total delay is 5020 microseconds
        Reliability is 255/255
        Load is 1/255
        Minimum MTU is 1500
        Hop count is 2
        Originating router is 192.168.13.1

Let’s see what regular convergence time looks like when we shut down R2’s Gi0/2 link pointing downstream to R4. R4 won’t know the link is down until the hold-time expires (15 seconds, by default).

In my debug eigrp fsm output, DUAL knows to remove routes received from R2 (192.168.24.2)

1
*Jun 21 00:52:54.600: DUAL: AS(100) Removing dest 192.168.34.0/24, nexthop 192.168.24.2

If I drill down into R1’s Lo0 (1.1.1.1/32), I can see DUAL trying to find a feasible successor.

1
2
3
4
5
6
7
*Jun 21 00:52:54.602: EIGRP-IPv4(100): Find FS for dest 1.1.1.1/32. FD is 131072, RD is 131072 on tid 0
*Jun 21 00:52:54.602: EIGRP-IPv4(100):  192.168.24.2 metric 72057594037927935/72057594037927935
*Jun 21 00:52:54.602: EIGRP-IPv4(100):  192.168.34.3 metric 154112/130816 found Dmin is 154112
*Jun 21 00:52:54.602: DUAL: AS(100) Removing dest 1.1.1.1/32, nexthop 192.168.24.2
*Jun 21 00:52:54.602: DUAL: AS(100) RT installed 1.1.1.1/32 via 192.168.34.3
*Jun 21 00:52:54.602: DUAL: AS(100) Send update about 1.1.1.1/32. Reason: metric chg on tid 0
*Jun 21 00:52:54.602: DUAL: AS(100) Send update about 1.1.1.1/32. Reason: new if on tid 0

DUAL keeps a tab of the 1.1.1.1/32 network from 192.168.24.2 as an FD of infinity so it’s posioned. This would be sent to downstream neighbors to have them trigger convergence. Judging by the timestamps, it appears that DUAL ran very fast, almost instantly.

In the middle of this lab, I realized the regular IOSv image didn’t support FRR so I had to switch over to a CSR1000v.

Unfortunately, the issue here is that since CML is a lab environment without physical delays, congestion, packet loss, etc., it’s really hard to differentiate between the convergence times of DUAL under normal conditions versus FRR. We’re also using a very small topology.

Nonetheless, let’s show what FRR does behind the scenes with the debug eigrp frr and debug eigrp fsm commands.

1
2
3
4
5
6
7
8
9
10
11
12
*Jun 21 01:14:38.640: DUAL: Destination 1.1.1.1/32 for tid 0
*Jun 21 01:14:38.640: EIGRP-IPv4(100): Find FS for dest 1.1.1.1/32. FD is 329646080, RD is 329646080 on tid 0
*Jun 21 01:14:38.640: EIGRP-IPv4(100):  192.168.24.2 metric 72057594037927935/72057594037927935
*Jun 21 01:14:38.640: EIGRP-IPv4(100):  192.168.34.3 metric 394526720/328990720 found Dmin is 394526720
*Jun 21 01:14:38.641: DUAL: AS(100) Removing dest 1.1.1.1/32, nexthop 192.168.24.2
*Jun 21 01:14:38.641: DUAL: AS(100) RT installed 1.1.1.1/32 via 192.168.34.3
*Jun 21 01:14:38.641: DUAL: AS(100) Send update about 1.1.1.1/32. Reason: metric chg on tid 0
*Jun 21 01:14:38.641: DUAL: AS(100) Send update about 1.1.1.1/32. Reason: new if on tid 0
*Jun 21 01:14:38.641:  EIGRP-FRR: 1.1.1.1/32 is having 1 available paths and 1 successors
*Jun 21 01:14:38.642: DUAL: linkdown: finish
*Jun 21 01:14:38.651: %BFD-6-BFD_SESS_DESTROYED: BFD-SYSLOG: bfd_session_destroyed,  handle:1 neigh proc:FRR, handle:1 active

In a production environment, Cisco argues that FRR can provide sub-50ms reconvergence times. The other features of FRR are pretty difficult to simulate in CML. However, I hope you see the power of what FRR can do even though they may not be clear inside of a virtualized environment.

This post is licensed under CC BY 4.0 by the author.

Trending Tags