Junos CSPF for multiple dynamic paths

A quick one today, but a few people have recently asked me how the Junos CSPF process manages to always ensure the primary and secondary paths are kept diverse *automagically*. RSVP path calculation with CSPF is an interesting topic, and if you look through the archives, you’ll find I’ve touched on this as I’ve discussed SRLGs, RSVP auto-bandwidth, and other such topics.

Lets assume for the purposes of this post that we have a LSP that looks like this:

[edit protocols mpls]
[email protected]# show 
label-switched-path test-LSP-to-PE5 {
    to 10.0.0.5;
    primary primary-path-to-5;
    secondary secondary-path-to-5;
}
path primary-path-to-5;
path secondary-path-to-5;

This LSP is pretty simple. There are no constraints implemented for either of the primary or secondary paths, and no constraints on the LSP as a whole.

Unpon configuration of a LSP such as this, Junos will attempt to stand up the primary path. This will follow the best IGP metric subject to constraints (available bandwidth, admin-groups, SRLGs, etc). Only once this has stood up will the calculation for the secondary begin.

As Junos calculates the secondary path, it will first take a copy of the topology of the RSVP network and inflate the IGP metric of every link that has been used by the primary path by a factor of 8 million. This ensures that if at all possible, a diverse path is used – while still allowing the use of a shared path between the primary and secondary if this is our only option.

When the primary path re-optimizes to a new path, this process is repeated, thus ensuring that we at all times try to keep the primary and secondary as diverse as possible.

It’s worth noting that if you turn off CSPF for the LSP, the above functionality will not be implemented, and both paths will simply follow the best IGP path.

Hope this helps!

Shared Fate

Been doing more playing in the lab recently as a part of the wider expansions I’m doing in my 9-to-5 (or actually to be more accurate – today on a Sunday, while I’m in the office catching up on work!). One of the great things about CSPF is the ability to specify two paths with no constraints whatsoever in a LSP and then have it automagically route the secondary in a fairly diverse manner from the primary – ensuring that we have good resiliency without the need for manual intervention. The way that this is done is that when CSPF algorithm is run for a secondary path, it artificially inflates (for the purposes of the specific secondary path CSPF calculation only) the IGP metric for each of the links the primary path has used by 8,000,000, ensuring that there is a strong preference not to use links which the primary path is already routed over – and that therefore you are not going to have your primary and secondary paths fail simultaneously causing the LSP to have to re-signal from scratch..

However – more and more I’m making use of both passive CWDM (which we can now get out of China for cheap) and multiple bearers on the same fibres via DWDM systems – some breaking out at all nodes and some only at core nodes. The challenge here is that while I myself posess a faily intimate knowledge of how this all hangs together physically the CSPF process does not – and it would not be unlikely that it would (based on IGP metrics) try to route a primary and secondary path down different bearers on the same fibre strands.

In 2011, Juniper added SRLGs (Shared Risk Link Groups) as a feature in release 11.4 based on RFCs 4203 & 5307. What this allows us to do is quite clever. We can define a set of “risk groups” – each of which is essentially a group of interfaces which share a common component of some sort. Once this is defined we can either have it artificially (only for the purposes of the CSPF calculation for a particular LSP that has already used the risk group for the primary path) have it increase the metric for it’s calculation of the secondary path, or completely exclude the link from the secondary path calculation when the primary has already used a link on that risk group.

This is fairly easy to configure, firstly you want to define the risk group on all the boxes in your RSVP core. Firstly (in a very similar manner to MPLS admin-groups) you need to define the risk-group names and assign them group-numbers (note here that you can define up to 4294967295 risk groups, so you are unlikely to run out of numbers!);

set routing-options srlg wellington-cross-town-primary-fibre-path srlg-value 530

Now, we need to tell the routers with interfaces on links that share a common point of failure which risk-group(s) they are in. Note the optional pluralization here – an interface can be a member of multiple risk groups – and in fact in a situation where we had two bearers on the same fibre but with only one breaking out at a POP we made the bypass bearer a member of multiple groups;

set protocols mpls interface xe-0/0/0.100 srlg wellington-cross-town-primary-fibre-path

At this point we can see in a show mpls interface the SRLGs a group is a part of. It’s worth noting at this point also that in my lab I had a square of 4 routers. The LSP here is between the bottom two routers in the square, however I duplicated the bottom side of the square to give two very desirable paths that I could put into a SRLG. Each link had an IGP metric of 10. Here’s the show mpls interface;

[email protected]# run show mpls interface detail 
Interface: xe-0/0/0.100
  State: Up
  Administrative group: 
  SRLG: wellington-cross-town-primary-fibre-path
  Maximum labels: 3
  Static protection revert time: 5 seconds
  Always mark connection protection tlv: Disabled
  Switch away lsps : Disabled

We can also see on any LSP which SRLGs it is transiting on each path;

[email protected]# run show mpls lsp name A-to-B ingress
Ingress LSP: 7 sessions

1.1.1.1
  From: 2.2.2.2, State: Up, ActiveRoute: 0, LSPname: A-to-B
  ActivePath: A-to-B-P (primary)
  FastReroute desired
  LSPtype: Static Configured, Penultimate hop popping
  LoadBalance: Random
  Autobandwidth 
  AdjustTimer: 300 secs 
  Max AvgBW util: 0bps, Bandwidth Adjustment in 262 second(s).
  Overflow limit: 0, Overflow sample count: 0
  Underflow limit: 0, Underflow sample count: 0, Underflow Max AvgBW: 0bps
  Encoding type: Packet, Switching type: Packet, GPID: IPv4
 *Primary   A-to-B-P       State: Up
    Priorities: 5 5
    OptimizeTimer: 60
    SmartOptimizeTimer: 180
    SRLG: wellington-cross-town-primary-fibre-path
    Reoptimization in 43 second(s).
    Computed ERO (S [L] denotes strict [loose] hops): (CSPF metric: 10)
 192.168.0.66 S 
    Received RRO (ProtectionFlag 1=Available 2=InUse 4=B/W 8=Node 10=SoftPreempt 20=Node-ID):
          192.168.0.66
  Standby   A-to-B-S       State: Up
    Priorities: 5 5
    OptimizeTimer: 60
    SmartOptimizeTimer: 180
    SRLG: wellington-cross-town-primary-fibre-path
    Reoptimization in 51 second(s).
    Computed ERO (S [L] denotes strict [loose] hops): (CSPF metric: 11)
 192.168.0.8 S 
    Received RRO (ProtectionFlag 1=Available 2=InUse 4=B/W 8=Node 10=SoftPreempt 20=Node-ID):
          192.168.0.8
Total 1 displayed, Up 1, Down 0

In the output above you can see that even though we have SRLGs configured, they’re not doing anything – the primary and secondary paths still transit the same SRLG! To tell CSPF to do something with this new data – we have two options. The first is to define an increase in the IGP cost to be added to the secondary path calculation on a per-LSP basis if you’ve already put the primary over a link in that group;

set routing-options srlg wellington-cross-town-primary-fibre-path srlg-cost 1000

Or, to be more sure that the paths will never share a risk group, you can just tell the router to exclude the paths that share risk with the primary path entirely from the secondary path calculation;

set protocols mpls exclude-srlg

Once this is done, this is one of those things that just work. Here’s the output for either the srlg-cost or exclude-srlg options (both give the same result in my scenario due to the costings used);

[email protected]# run show mpls lsp name A-to-B ingress detail logical-system SNAP-DUD-POP-RTR01    
Ingress LSP: 7 sessions

1.1.1.1
  From: 2.2.2.2, State: Up, ActiveRoute: 0, LSPname: A-to-B
  ActivePath: A-to-B-P (primary)
  FastReroute desired
  LSPtype: Static Configured, Penultimate hop popping
  LoadBalance: Random
  Autobandwidth 
  AdjustTimer: 300 secs 
  Max AvgBW util: 0bps, Bandwidth Adjustment in 270 second(s).
  Overflow limit: 0, Overflow sample count: 0
  Underflow limit: 0, Underflow sample count: 0, Underflow Max AvgBW: 0bps
  Encoding type: Packet, Switching type: Packet, GPID: IPv4
 *Primary   A-to-B-P       State: Up
    Priorities: 5 5
    OptimizeTimer: 60
    SmartOptimizeTimer: 180
    SRLG: wellington-cross-town-primary-fibre-path #### Note that the SRLG is only seen in the primary path
    Reoptimization in 30 second(s).
    Computed ERO (S [L] denotes strict [loose] hops): (CSPF metric: 10)
 192.168.0.66 S #### Note that this is still taking a 1 hop path (the shortest path)
    Received RRO (ProtectionFlag 1=Available 2=InUse 4=B/W 8=Node 10=SoftPreempt 20=Node-ID):
          192.168.0.66
  Standby   A-to-B-S       State: Up
    Priorities: 5 5
    OptimizeTimer: 60
    SmartOptimizeTimer: 180
    Reoptimization in 56 second(s).
    Computed ERO (S [L] denotes strict [loose] hops): (CSPF metric: 30)
 192.168.0.5 S 192.168.0.1 S 192.168.0.2 S 
    Received RRO (ProtectionFlag 1=Available 2=InUse 4=B/W 8=Node 10=SoftPreempt 20=Node-ID):
          192.168.0.5 192.168.0.1 192.168.0.2 #### Note that this is taking a longer path to avoid sharing the SRLG
Total 1 displayed, Up 1, Down 0

Hope this is of use!

RSVP capacity management madness

Recently I’ve been doing a bunch of work around scaling our core network and in particular ensuring that CSPF has the right information to make intelligent decisions based on the amount of capacity available on different segments of our network and avoid putting primary and secondary paths on bearers that share common elements such as fibre paths etc.

This is all pretty simple with CSPF (Constrained Shortest Path First). Possessing a general aversion to using aggregate ethernet interfaces, and needing to balance around multiple geographically dispersed bearers this has provided some really good results.

It’s important to remember that RSVP was designed as a capacity reservation protocol long before it was used for signalling LSPs – therefore has some inherently good features for shuffling around LSPs as links get full.

Assuming that whatever the IGP there are on any given network point to the most preferred path between any two paths, we can assume that CSPF will just do it’s thing and by default automagically push the primary LSP path onto the best path (where possible) then avoid the primary path when building secondary paths. What I’ve been look at is a way to ensure that when we hit a threshold – say 70% of the capacity of any link on a 15min average – we’re shuffling off some traffic to decrease the load on that particular bearer. It may be worth noting here that a lot of these configuration tasks may be easier to apply in configuration groups with wildcards for interfaces or LSPs so that you save yourself from re-typing a lot of configuration into every LSP into your network!

Let’s assume for the purposes of this post that we’re starting with a LSP with two paths (remember the magic of CSPF in that it will automatically try to route the secondary path in a diverse manner to the primary path, and FRR detours will also be diverse where possible), and fast-reroute turned on. We’ll also assume that we’re optimizing every 5mins;

set protocols mpls label-switched-path A-to-B to x.x.x.x
set protocols mpls label-switched-path A-to-B fast-reroute
set protocols mpls label-switched-path A-to-B primary A-to-B-primary
set protocols mpls label-switched-path A-to-B secondary A-to-B-alternate standby
set protocols mpls path A-to-B-primary
set protocols mpls path A-to-B-alternate
set protocols mpls optimize-timer 300

The first thing to do here is to tell RSVP what the bandwidth of each link is

set protocols rsvp interface X bandwidth 10g

Following this, we need to tell the Juniper to start gathering statistics as to the utilisation of each link. This is pretty simple, all we need to decide on is number of files / size of files – plus how often to gather stats on utilisation per LSP;

set protocols mpls statistics file mpls_auto-statistics
set protocols mpls statistics file size 5m
set protocols mpls statistics file files 3
set protocols mpls statistics interval 300
set protocols mpls statistics auto-bandwidth

Now we need to tell the LSP to use the stats to reserve out the capacity it is using (in this example I’m doing it every 900 seconds). Note that it’s recommended to gather statistics (above config) 3x faster than you update the LSP (below config);

set protocols mpls label-switched-path A-to-B auto-bandwidth adjust-interval 900

At this point it should actually start working. A link will subscribe out capacity up to 100% then become unavailable for additional LSPs to reserve capacity on it. If the LSPs on the link creep up to over 100% one will be booted off the link – HOWEVER it’s actually quite (very very) unlikely that you’re going to go over 100% utilisation and boot a LSP off a link. Additionally I want to ensure that I’m only running my links up to 60-70% (based on a 15min average) before I free up capacity, therefore ensuring I have a good amount of burstability on my core bearers. While I could specify all links as having a bandwidth far lower than they actually do, there’s a far better way to do this, in that you can set a link to act like it only has xx% of the capacity it actually has;

set protocols rsvp interface X subscription 65

This is likely to produce some better results.

The next thing we want to do is ensure that we signal out our new path before tearing down our old path to prevent a brief loss of service. The “adaptive” command does exactly this;

set protocols mpls label-switched-path A-to-B adaptive

It’s worth noting that the “adaptive” command also changes the reservation type to SE (shared reservation between all paths/detours of the LSP) rather than FF (one reservation per member path/detour of the LSP – so a bit of doubling up potentially!)

Finally, despite having primary and secondary paths, plus fast reroute detours on both – I only actually want to reserve capacity at this point on the primary paths. While I can’t move the autobandwidth command into the more specific hierarchy, what I can do is leave it where it is and explicitly force the secondary and fast-reroute paths not to reserve any bandwidth;

set protocols mpls label-switched-path A-to-B secondary A-to-B-alternate bandwidth 0
set protocols mpls label-switched-path A-to-B fast-reroute bandwidth 0

At this point I have quite a cool setup – the primary path will only ever stand up where it has capacity, however in a disaster situation the secondary and fast-reroute paths will stand up anywhere and let CoS take care of the rest. Due to the fact that by default JunOS tries to re-signal the primary path every 30secs, I’m not going to be on a sub-optimal path for any great length of time (and I could tune this a bit later if I find I need to), but I’ve got my CSPF process passing intelligently selected routes (that have sufficient capacity available) to RSVP for reservations based on the available capacity on each link.

It’s also worth noting in the failure scenario that you are likely to have a fair bit of capacity available depending on your subscription ratios before you will need to use CoS.

I’ll post soon on some related fiddling with shared-fate which I’ve been looking at (as a part of the wider expansions I’m doing to multiple bearers) to ensure that I’m not doing silly things like signalling my primary and secondary paths on the same fibre (while keeping all the signalling automated!).

UPDATE – a few people have been asking me if there is a way to force an immediate auto-adjustment (i.e. if you get a big traffic spike). In JUNOS you can do this with the following command (? your way through it to specify an LSP name, or just run it to reoptimize every LSP that this PE is the ingress LSR for);

request mpls lsp adjust-autobandwidth

Note that this will update the RSVP reservation based on the latest statistics run but won’t gather fresh statistics (so it’s worth gathering statistics pretty frequently).

Additionally, it’s worth noting that you can change the path selection method of the LSP for routing the LSP choose links based on different criterion with the least-fill or most-fill options as per the below example. The default behaviour is to randomly select once it’s met the other criterion documented at http://www.juniper.net/techpubs/en_US/junos10.4/topics/concept/mpls-cspf-path-selection-method.html.

set protocols mpls label-switched-path A-to-B (least/most)-fill

Note that this is based on the percentage fill of the link, not the amount of megs/gigs available.

Additionally, if you have not played with RSVP-TE optimization before, have a look at http://blog.hoff.geek.nz/2013/11/02/rsvp-optimization-on-junos/.