Subpolicies

A useful feature I’ve been playing with is the ability to use “sub” policies within JUNOS policies for the purposes of matching. This is quite useful for me in some of the work I’ve been doing, as I’ve managed to shrink ~300 lines of policy to ~120 lines by implementing this as much as possible.

Essentially the way it works is that you can use a policy as a from criterion in another policy. i.e.

[edit policy-options policy-statement peerXXX-bgp-export]
[email protected]# show 
term default {
    from policy subpolicy_match_default;
    then reject;
}

[edit policy-options policy-statement subpolicy_match_default]
[email protected]# show 
term v4-default {
    from {
        family inet;
        route-filter 0.0.0.0/0 exact;
    }
    then accept;
}
term v6-default {
    from {
        family inet6;
        route-filter 0::/0 exact;
    }
    then accept;
}
then reject;

What will happen is that everything that is accepted by subpolicy_match_default is a MATCH for the purposes of term default in policy peerXXX-bgp-export. Everything that is rejected by subpolicy_match_default is NO MATCH for the purposes of this. So in this case both the v4 and v6 default routes will be accepted by the sub-policy which will cause the main policy to reject them for export to peerXXX (as they match the term in the main policy which then rejects them), while all other routes will be rejected (as they do not match the term in the main policy which is doing the reject (given that routes rejected by the sub policy are not a match for the purposes of the main policy).

A fairly trivial feature probably, but hellishly useful when configuring lots of similar policies to be able to refer to what is effectively a set of repetitively used subroutines within your policy….

Shared Fate

Been doing more playing in the lab recently as a part of the wider expansions I’m doing in my 9-to-5 (or actually to be more accurate – today on a Sunday, while I’m in the office catching up on work!). One of the great things about CSPF is the ability to specify two paths with no constraints whatsoever in a LSP and then have it automagically route the secondary in a fairly diverse manner from the primary – ensuring that we have good resiliency without the need for manual intervention. The way that this is done is that when CSPF algorithm is run for a secondary path, it artificially inflates (for the purposes of the specific secondary path CSPF calculation only) the IGP metric for each of the links the primary path has used by 8,000,000, ensuring that there is a strong preference not to use links which the primary path is already routed over – and that therefore you are not going to have your primary and secondary paths fail simultaneously causing the LSP to have to re-signal from scratch..

However – more and more I’m making use of both passive CWDM (which we can now get out of China for cheap) and multiple bearers on the same fibres via DWDM systems – some breaking out at all nodes and some only at core nodes. The challenge here is that while I myself posess a faily intimate knowledge of how this all hangs together physically the CSPF process does not – and it would not be unlikely that it would (based on IGP metrics) try to route a primary and secondary path down different bearers on the same fibre strands.

In 2011, Juniper added SRLGs (Shared Risk Link Groups) as a feature in release 11.4 based on RFCs 4203 & 5307. What this allows us to do is quite clever. We can define a set of “risk groups” – each of which is essentially a group of interfaces which share a common component of some sort. Once this is defined we can either have it artificially (only for the purposes of the CSPF calculation for a particular LSP that has already used the risk group for the primary path) have it increase the metric for it’s calculation of the secondary path, or completely exclude the link from the secondary path calculation when the primary has already used a link on that risk group.

This is fairly easy to configure, firstly you want to define the risk group on all the boxes in your RSVP core. Firstly (in a very similar manner to MPLS admin-groups) you need to define the risk-group names and assign them group-numbers (note here that you can define up to 4294967295 risk groups, so you are unlikely to run out of numbers!);

set routing-options srlg wellington-cross-town-primary-fibre-path srlg-value 530

Now, we need to tell the routers with interfaces on links that share a common point of failure which risk-group(s) they are in. Note the optional pluralization here – an interface can be a member of multiple risk groups – and in fact in a situation where we had two bearers on the same fibre but with only one breaking out at a POP we made the bypass bearer a member of multiple groups;

set protocols mpls interface xe-0/0/0.100 srlg wellington-cross-town-primary-fibre-path

At this point we can see in a show mpls interface the SRLGs a group is a part of. It’s worth noting at this point also that in my lab I had a square of 4 routers. The LSP here is between the bottom two routers in the square, however I duplicated the bottom side of the square to give two very desirable paths that I could put into a SRLG. Each link had an IGP metric of 10. Here’s the show mpls interface;

[email protected]# run show mpls interface detail 
Interface: xe-0/0/0.100
  State: Up
  Administrative group: 
  SRLG: wellington-cross-town-primary-fibre-path
  Maximum labels: 3
  Static protection revert time: 5 seconds
  Always mark connection protection tlv: Disabled
  Switch away lsps : Disabled

We can also see on any LSP which SRLGs it is transiting on each path;

[email protected]# run show mpls lsp name A-to-B ingress
Ingress LSP: 7 sessions

1.1.1.1
  From: 2.2.2.2, State: Up, ActiveRoute: 0, LSPname: A-to-B
  ActivePath: A-to-B-P (primary)
  FastReroute desired
  LSPtype: Static Configured, Penultimate hop popping
  LoadBalance: Random
  Autobandwidth 
  AdjustTimer: 300 secs 
  Max AvgBW util: 0bps, Bandwidth Adjustment in 262 second(s).
  Overflow limit: 0, Overflow sample count: 0
  Underflow limit: 0, Underflow sample count: 0, Underflow Max AvgBW: 0bps
  Encoding type: Packet, Switching type: Packet, GPID: IPv4
 *Primary   A-to-B-P       State: Up
    Priorities: 5 5
    OptimizeTimer: 60
    SmartOptimizeTimer: 180
    SRLG: wellington-cross-town-primary-fibre-path
    Reoptimization in 43 second(s).
    Computed ERO (S [L] denotes strict [loose] hops): (CSPF metric: 10)
 192.168.0.66 S 
    Received RRO (ProtectionFlag 1=Available 2=InUse 4=B/W 8=Node 10=SoftPreempt 20=Node-ID):
          192.168.0.66
  Standby   A-to-B-S       State: Up
    Priorities: 5 5
    OptimizeTimer: 60
    SmartOptimizeTimer: 180
    SRLG: wellington-cross-town-primary-fibre-path
    Reoptimization in 51 second(s).
    Computed ERO (S [L] denotes strict [loose] hops): (CSPF metric: 11)
 192.168.0.8 S 
    Received RRO (ProtectionFlag 1=Available 2=InUse 4=B/W 8=Node 10=SoftPreempt 20=Node-ID):
          192.168.0.8
Total 1 displayed, Up 1, Down 0

In the output above you can see that even though we have SRLGs configured, they’re not doing anything – the primary and secondary paths still transit the same SRLG! To tell CSPF to do something with this new data – we have two options. The first is to define an increase in the IGP cost to be added to the secondary path calculation on a per-LSP basis if you’ve already put the primary over a link in that group;

set routing-options srlg wellington-cross-town-primary-fibre-path srlg-cost 1000

Or, to be more sure that the paths will never share a risk group, you can just tell the router to exclude the paths that share risk with the primary path entirely from the secondary path calculation;

set protocols mpls exclude-srlg

Once this is done, this is one of those things that just work. Here’s the output for either the srlg-cost or exclude-srlg options (both give the same result in my scenario due to the costings used);

[email protected]# run show mpls lsp name A-to-B ingress detail logical-system SNAP-DUD-POP-RTR01    
Ingress LSP: 7 sessions

1.1.1.1
  From: 2.2.2.2, State: Up, ActiveRoute: 0, LSPname: A-to-B
  ActivePath: A-to-B-P (primary)
  FastReroute desired
  LSPtype: Static Configured, Penultimate hop popping
  LoadBalance: Random
  Autobandwidth 
  AdjustTimer: 300 secs 
  Max AvgBW util: 0bps, Bandwidth Adjustment in 270 second(s).
  Overflow limit: 0, Overflow sample count: 0
  Underflow limit: 0, Underflow sample count: 0, Underflow Max AvgBW: 0bps
  Encoding type: Packet, Switching type: Packet, GPID: IPv4
 *Primary   A-to-B-P       State: Up
    Priorities: 5 5
    OptimizeTimer: 60
    SmartOptimizeTimer: 180
    SRLG: wellington-cross-town-primary-fibre-path #### Note that the SRLG is only seen in the primary path
    Reoptimization in 30 second(s).
    Computed ERO (S [L] denotes strict [loose] hops): (CSPF metric: 10)
 192.168.0.66 S #### Note that this is still taking a 1 hop path (the shortest path)
    Received RRO (ProtectionFlag 1=Available 2=InUse 4=B/W 8=Node 10=SoftPreempt 20=Node-ID):
          192.168.0.66
  Standby   A-to-B-S       State: Up
    Priorities: 5 5
    OptimizeTimer: 60
    SmartOptimizeTimer: 180
    Reoptimization in 56 second(s).
    Computed ERO (S [L] denotes strict [loose] hops): (CSPF metric: 30)
 192.168.0.5 S 192.168.0.1 S 192.168.0.2 S 
    Received RRO (ProtectionFlag 1=Available 2=InUse 4=B/W 8=Node 10=SoftPreempt 20=Node-ID):
          192.168.0.5 192.168.0.1 192.168.0.2 #### Note that this is taking a longer path to avoid sharing the SRLG
Total 1 displayed, Up 1, Down 0

Hope this is of use!

RSVP capacity management madness

Recently I’ve been doing a bunch of work around scaling our core network and in particular ensuring that CSPF has the right information to make intelligent decisions based on the amount of capacity available on different segments of our network and avoid putting primary and secondary paths on bearers that share common elements such as fibre paths etc.

This is all pretty simple with CSPF (Constrained Shortest Path First). Possessing a general aversion to using aggregate ethernet interfaces, and needing to balance around multiple geographically dispersed bearers this has provided some really good results.

It’s important to remember that RSVP was designed as a capacity reservation protocol long before it was used for signalling LSPs – therefore has some inherently good features for shuffling around LSPs as links get full.

Assuming that whatever the IGP there are on any given network point to the most preferred path between any two paths, we can assume that CSPF will just do it’s thing and by default automagically push the primary LSP path onto the best path (where possible) then avoid the primary path when building secondary paths. What I’ve been look at is a way to ensure that when we hit a threshold – say 70% of the capacity of any link on a 15min average – we’re shuffling off some traffic to decrease the load on that particular bearer. It may be worth noting here that a lot of these configuration tasks may be easier to apply in configuration groups with wildcards for interfaces or LSPs so that you save yourself from re-typing a lot of configuration into every LSP into your network!

Let’s assume for the purposes of this post that we’re starting with a LSP with two paths (remember the magic of CSPF in that it will automatically try to route the secondary path in a diverse manner to the primary path, and FRR detours will also be diverse where possible), and fast-reroute turned on. We’ll also assume that we’re optimizing every 5mins;

set protocols mpls label-switched-path A-to-B to x.x.x.x
set protocols mpls label-switched-path A-to-B fast-reroute
set protocols mpls label-switched-path A-to-B primary A-to-B-primary
set protocols mpls label-switched-path A-to-B secondary A-to-B-alternate standby
set protocols mpls path A-to-B-primary
set protocols mpls path A-to-B-alternate
set protocols mpls optimize-timer 300

The first thing to do here is to tell RSVP what the bandwidth of each link is

set protocols rsvp interface X bandwidth 10g

Following this, we need to tell the Juniper to start gathering statistics as to the utilisation of each link. This is pretty simple, all we need to decide on is number of files / size of files – plus how often to gather stats on utilisation per LSP;

set protocols mpls statistics file mpls_auto-statistics
set protocols mpls statistics file size 5m
set protocols mpls statistics file files 3
set protocols mpls statistics interval 300
set protocols mpls statistics auto-bandwidth

Now we need to tell the LSP to use the stats to reserve out the capacity it is using (in this example I’m doing it every 900 seconds). Note that it’s recommended to gather statistics (above config) 3x faster than you update the LSP (below config);

set protocols mpls label-switched-path A-to-B auto-bandwidth adjust-interval 900

At this point it should actually start working. A link will subscribe out capacity up to 100% then become unavailable for additional LSPs to reserve capacity on it. If the LSPs on the link creep up to over 100% one will be booted off the link – HOWEVER it’s actually quite (very very) unlikely that you’re going to go over 100% utilisation and boot a LSP off a link. Additionally I want to ensure that I’m only running my links up to 60-70% (based on a 15min average) before I free up capacity, therefore ensuring I have a good amount of burstability on my core bearers. While I could specify all links as having a bandwidth far lower than they actually do, there’s a far better way to do this, in that you can set a link to act like it only has xx% of the capacity it actually has;

set protocols rsvp interface X subscription 65

This is likely to produce some better results.

The next thing we want to do is ensure that we signal out our new path before tearing down our old path to prevent a brief loss of service. The “adaptive” command does exactly this;

set protocols mpls label-switched-path A-to-B adaptive

It’s worth noting that the “adaptive” command also changes the reservation type to SE (shared reservation between all paths/detours of the LSP) rather than FF (one reservation per member path/detour of the LSP – so a bit of doubling up potentially!)

Finally, despite having primary and secondary paths, plus fast reroute detours on both – I only actually want to reserve capacity at this point on the primary paths. While I can’t move the autobandwidth command into the more specific hierarchy, what I can do is leave it where it is and explicitly force the secondary and fast-reroute paths not to reserve any bandwidth;

set protocols mpls label-switched-path A-to-B secondary A-to-B-alternate bandwidth 0
set protocols mpls label-switched-path A-to-B fast-reroute bandwidth 0

At this point I have quite a cool setup – the primary path will only ever stand up where it has capacity, however in a disaster situation the secondary and fast-reroute paths will stand up anywhere and let CoS take care of the rest. Due to the fact that by default JunOS tries to re-signal the primary path every 30secs, I’m not going to be on a sub-optimal path for any great length of time (and I could tune this a bit later if I find I need to), but I’ve got my CSPF process passing intelligently selected routes (that have sufficient capacity available) to RSVP for reservations based on the available capacity on each link.

It’s also worth noting in the failure scenario that you are likely to have a fair bit of capacity available depending on your subscription ratios before you will need to use CoS.

I’ll post soon on some related fiddling with shared-fate which I’ve been looking at (as a part of the wider expansions I’m doing to multiple bearers) to ensure that I’m not doing silly things like signalling my primary and secondary paths on the same fibre (while keeping all the signalling automated!).

UPDATE – a few people have been asking me if there is a way to force an immediate auto-adjustment (i.e. if you get a big traffic spike). In JUNOS you can do this with the following command (? your way through it to specify an LSP name, or just run it to reoptimize every LSP that this PE is the ingress LSR for);

request mpls lsp adjust-autobandwidth

Note that this will update the RSVP reservation based on the latest statistics run but won’t gather fresh statistics (so it’s worth gathering statistics pretty frequently).

Additionally, it’s worth noting that you can change the path selection method of the LSP for routing the LSP choose links based on different criterion with the least-fill or most-fill options as per the below example. The default behaviour is to randomly select once it’s met the other criterion documented at http://www.juniper.net/techpubs/en_US/junos10.4/topics/concept/mpls-cspf-path-selection-method.html.

set protocols mpls label-switched-path A-to-B (least/most)-fill

Note that this is based on the percentage fill of the link, not the amount of megs/gigs available.

Additionally, if you have not played with RSVP-TE optimization before, have a look at http://blog.hoff.geek.nz/2013/11/02/rsvp-optimization-on-junos/.

BGP “allow”

A useful feature which I’ve struck (and am in fact currently waiting for a midnight change window to implement on one of our route reflectors) is the BGP “allow” feature in JUNOS. What this does is allows you to specify a netblock for RPD to accept incoming BGP connections from (in a passive mode) and stand up neighbor relationships to any connecting devices – without having to specifically configure neighbors for each device.

The configuration for this quite simple – in place of a neighbor statement we do this;

set protocols bgp group ABC allow 1.1.1.0/24

The one caveat I’ve struck so far with this is that for some reason JUNOS will not allow you to do MD5 auth on any BGP group with this feature enabled;

[email protected]# set protocols bgp group ABC authentication-key beer   

{master}[edit]
[email protected]# commit check 
re0: 
[edit protocols bgp group ABC]
  'allow'
    May not be configured with authentication-key
error: configuration check-out failed: (statements constraint check failed)

{master}[edit]

I find this irritating, as I would consider it ever so slightly less secure to dynamically allow connections on say your route-reflectors from the rest of your network. Having said this, you should only be using this feature for IP ranges which you have a strong control over (i.e. know that nobody can spoof being inside), so this should not be an issue. But it is an irritating downside to an otherwise awesome feature.

I’ve so far struck a couple of interesting uses for this. The first is to use it on route-reflectors – when you’re deploying 10 MPLS PEs in a month it gets a little tiring constantly logging onto your route-reflectors and configuring up more and more BGP sessions to new devices. Call me lazy – but I see this as a quite cool way to automate this problem away.

The other use was potentially slightly more interesting. A friend was configuring a CDN type network with a bunch of content servers, and was after a way for VMs to be dynamically built and fired up on a particular vlan, then to start advertising anycast routes via a BGP session with no configuration on the routers (this making automating it easier for him). He successfully used this feature to achieve this.

The final thing I would note is that when using this feature, the Juniper isn’t going to perform exactly the same as it would when you have explicitly configured BGP neighbors as far as SNMP traps – as it’s not going to know about the peers until they initiate a connection, so if you are relying on traps to monitor sessions this may be something to watch out for.

JNCIE-SP in review

Over the last 6 months, I’ve done around 900 hours of study towards my JNCIE-SP exam (perhaps a bit too much, however coming from a Cisco background and only having touched my first Juniper router 10 months prior to the exam I felt this was prudent). I did the exam on the 26th of September (a few weeks ago at the time of writing), and now that I’ve had a chance to get a bit of rest afterwards thought I’d do a post on the experience.

One of the first things I’d say is that there’s really two elements to it. The first is an absolute understanding of the technologies within the syllabus (see http://www.juniper.net/us/en/training/certification/resources_jnciesp.html). Secondly, realistically to be attempting this you’re going to need a large amount of operational experience on the Juniper platform plus some solid debugging skills. The ability to quickly isolate a problem that’s either been pre-introduced in the exam (or that you create by doing something stupid) is crucial under the time pressure.

As for study, one of the interesting things about the Juniper certification problem (unlike the Cisco Certs) is that to do a JNCIE you have to do all levels of certification before that (so JNCIA, JNCIS-SP, JNCIP-SP & JNCIE-SP). It’s worth noting that a Juniper JNCIS is roughly equivalent to a CCNP, and a JNCIP is roughly equivalent to a CCIE written. What’s good about this is the Juniper resources for each of these certifications are as a general rule pretty focused on standards rather than the bullshit “vendor way” (while obviously having a slight slant towards some of the Juniperisms i.e. BGP VPLS rather than LDP VPLS there was very little proprietary rubbish or anything like that).

Wanting to ensure I didn’t unnecessarily throw away money (and having a  solid understanding of most of the underlying protocols), I ordered the books from all the courses for the certifications, and spent a fair amount of time studying these as I went through this. This proved helpful as when I got my JNCIP (and then began the 900 hours of study towards my JNCIE) I already had a very detailed knowledge of the Juniper implementation of these protocols and standards.

For the time I was spending studying specifically towards the JNCIE I did a combination of things. Firstly, I purchased 390 VM credits on Junosphere (1 credit = 1 virtual router for 24 hours), enough to spend 39 weekend days playing around with different implementations of various scenarios. I actually ended up only using 300 of these credits, however this was one of the most invaluable resources I had. Junosphere takes a bit of getting used to, and the interface can be a bit clunky at times, however after a while you get pretty quick at manually writing your own topology definitions and converting any bootcamp or lab environment documented in a book into a Junosphere topology so that you can fire it up yourself. I also purchased the Proteus and InetZero study guides for the exam. This was interesting – I found that the Proteus guide was very much focused on the technologies, however the lab scenarios were a bit rubbish, while the InetZero book was purely a lab book with some good scenarios. The InetZero labs were significantly more time consuming than the actual lab exam, however this put me in good stead to finish the exam in good time.

I read a wide range of books and articles, including;

  • The official Juniper JNCIP-SP material
  • Most of the Juniper “Day One” and “This Week” guides. Juniper have also released some vDayOne guides that come with a pre-built Junosphere component they walk you through which is very cool.
  • The InetZero JNCIE-SP lab guide (as mentioned above)
  • The Proteus JNCIE-SP study guide (again, mentioned above)
  • MPLS & VPNs Architectures Vol II by Ivan Pappeljack (while this was Cisco based, it was still incredibly useful)
  • O’Reilly’s Juniper MX Series
  • JNCIE-M study guide by Harry Reynolds
  • JNCIP-M study guide by Harry Reynolds
  • O’Reilly’s Junos Cookbook
  • MPLS Enabled Applications 3rd edition
  • A bunch of Juniper configuration guides
  • A bunch of old NOG presentations
  • RFC2328 – OSPFv2
  • RFC1195 – Integrated ISIS
  • RFC1771 – BGP4
  • RFC1965 – BGP confederations
  • RFC1997 – BGP communities
  • RFC4360 – BGP extended communities
  • RFC4105 – MPLS InterArea TE
  • RFC4364 – BGP/MPLS VPNs
  • RFC6037 – Draft Rosen (ugh!)
  • The Juniper route resolution guide
  • Plus a bunch more that I can’t remember!

As I was going through these, I found a few errors, particularly in the Juniper configuration examples, and sent corrections for these to my Juniper SE. I also found some typos in the Proteus lab guide (nothing major, but amusing to find) and told a friend who works there about it.

I labbed most of the features I was reading about fairly extensively. I had a study plan that took me through the sylabus bit by bit (one week per area) in order to ensure I covered everything. I spent a bit longer than this on a few areas such as Multicast (which I had never touched), and had to revisit some features a few times afterwards (such the more advanced forms of Interprovider VPN/VPLS). Generally I would spend most week nights reading / doing any labbing I could with a couple of MXs in the lab at work, then would try to do a 12-15 hour study day each weekend day on Junosphere (being a tight bastard I hated the idea of not getting the most value I possibly could out of the 24 hour blocks I was having to buy Junosphere time in.

6 weeks out from the exam I began doing 1 practice full exam each week, using the scenarios in the Proteus and InetZero books, then paying for a couple of remote proctored exams which Rick at Proteus ran (I cannot reccomend this enough, Rick was brilliant, the scenarios were a good reflection of the sorts of tasks to expect in the exam, and he was happy to keep emailing me long afterwards with any questions that came up during my study). The benefit of this is that when it came to the real thing I had got past the stress that was only having 8 hours to complete a bunch of tasks on a very broken network. On one of my first practice exams I spent 2.5 hours debugging a LDP issue, then missed a whole lot of other stupid errors because I had got stuck on one question! One of the most important lessons I learned was never to spend more than 5mins debugging a task – move on, do something else, come back and you may find that a fresh perspective makes you see something really obvious and stupid!

During all this time I was very lucky to have a bunch of friends to bounce ideas off and chew the fat with on things I either didn’t quite get or hadn’t struck before. I’d like to particularly thank Chris Jones, Kurt Bales, Ivan Walker, Dylan Hall & Vance McIndoe who were invaluable in being able to ask the occasional dumb (or potentially not so dumb) question, be it some of the more odd behaviours of confederations or completely schooling me in how multicast worked. I’d say that making sure you have “study buddies” who you can do this with is really important – as most people learn better with someone else to bounce things off.

3 weeks out from the exam I made the 12 hour flight to San Francisco and set up camp in a hotel there. That week I did the bootcamp, which was an interesting experience – mainly because I had not realised before that how well prepared I was! The instructor indicated that in the labs in the bootcamp we should be ok as far as speed goes if we were completing 3/4 of the tasks within the allocated time (they were trying to put us under the gun in this respect and make sure we could work well under pressure – however I was completing ALL the tasks within 2/3 of the time, making me feel more confident than I had been before about the whole thing! The bootcamp was really valuable in that again it was another opportunity to do a bunch of tasks in a simulated lab environment which reflected the sorts of things you would be doing, all while being able to discuss any issues you didn’t quite get (by that time though there weren’t many luckily) with the Instructor.

From that point I spent a week and a half doing some final brush up and simulations, plus a couple more remote proctored lab (plus I snuck up to see the Americas Cup boat racing a couple of times :)).

Finally came the day of the exam. One of the things I can recommend the most in the exam is to make sure you don’t have to empty your bladder every 5mins – don’t drink too much coffee! The person to my left must have left 4 times to relieve himself in the morning alone – costing himself valuable time! I got to the Juniper offices early, had a good breakfast at the cafe, then waited. On my day there were 2 of us doing the SP exam, 2 doing SEC, and one doing ENT. Being under NDA, I can’t discuss anything that happened in the exam, however I can say that I completed all the tasks and debugging in 3 hours 15 minutes (including checking), then after spending another 2 hours 45 minuts re-checking everything a few times I was quite satisfied to walk out 2 hours early. I can only attribute this to the amount of study and lab time I put in before the exam. I can also say that the exam was one of the most fun days I’ve had in a long time – there’s nothing more enjoyable than having to do a heap of debugging then roll out some cool features in a network!

From that point I had a holiday booked with my wife (who had arrived in SFO while I was in the exam), so we spent the next few days in San Francisco seeing a few of the sights then travelling to Yosemite to have a poke around there. However it was REALLY hard to relax – even though I was absolutely sure I had everything right I was still nervous and spent a lot of time second guessing myself. Finally – 8 days after the exam on the day we were to leave to go back to New Zealand I woke up to the email from Juniper – I had passed!!!! Woohoo!!! I am JNCIE-SP#2204. It was nice not to be wondering if the email was waiting for me while I was offline on the plane trip.

In review, the JNCIE-SP was really enjoyable to work towards – as a true geek there’s nothing more interesting to me than learning new skills and standards, however I’m taking it a bit easier for the next few months at least and re-learning what it is to not do 40 hours study a week on top of work!

VRF import on JUNOS – gotchas

I’ve been playing quite a bit with VRF import policies on Juniper MXs. To briefly recap for those who haven’t played with Junipers before, you can either specify in the VRF configuration to import VRF target X, or you can create an import policy to do more specific / custom things.

While what I’m trying to achieve isn’t basic, it’s definitely not beyond the scope of what you should be able to do, however I’ve been finding that it’s not implemented as well as I had expected which is disappointing. While I’ve always found JUNOS policy far easier to work with than IOS route-maps, in the case of the two issues I struck today described below, I’m really hating on the JUNOS approach!

No regexp on VRF import policies.

In both these scenarios, I was trying to create a policy that imports blackhole routes. One of the great things in JUNOS is how well integrated the regexp functionality is, letting you do tasks that would traditionally take multiple terms in a policy (largely replicating the same bits of config barring one slight variable) and compress them down to a single term. Unfortunately this works for everything… except for this! In this specific use case, I was trying to import blackhole routes from ANY internet VRF to my VRF.

An example (note that in this example all Internet VRFs were in the range of 50[0-9];

set policy-options community InternetAny members target:12345:50.
set routing-instances InternetInternational vrf-import InternetInternationalImport

[email protected]# show policy-options policy-statement InternetInternationalImport
term Blackhole {
from community InternetAny;
then accept;
}

[edit]

[email protected]# commit check
error: InternetInternational: vrf-import policy cannot have wildcard target communities error: configuration check-out failed [edit]

Doing some digging I’ve found an article on the Juniper website stating this limitation, but with no sensible reason as to why they have decided not to support the normal JUNOS goodness for this feature.

Matching of multiple communities pushed to creating lots of community group combinations

A feature that is normally common to JUNOS and IOS is the ability to match on a criterion of multiple communities. Again – except for VRF Import in JUNOS. The advantage of this (going back to my blackholing example) is that I might want to say match on any Internet VRF target + my blackhole community to accept blackhole routes into the VRF. This is actually doable, just not in the way you would think. Here’s what I tried to do (probably more easily displayed without set format, note that InternetDomestic is a target community and Blackhole is a standard community);

[email protected]# show policy-options policy-statement InternetInternationalImport 
term Blackhole {
    from community [ Blackhole InternetDomestic ];
    then accept;
}
[edit]

[email protected]# commit check
error: InternetInternational: vrf-import policy permits accept action only if matching conditions contain a target community
error: configuration check-out failed

[edit]

As it turns out, any communities you match in a policy must include at least one target communities in their members. The solution ended up being to do something like this;

[email protected]# show policy-options policy-statement InternetInternationalImport
term Blackhole {
from community InternetDomestic_Plus_Blackhole;
then accept;
}
[edit]
[email protected]# show policy-options community InternetDomestic_Plus_Blackhole
members [ target:12345:500 12345:666 ];

[edit]

However, my issue with this is that you then have an explosion of communitiy definitions for every possible combination of communities that you might want to use in policy – and at the end of the day matching on combination X but not combination Y of communities is a job for policy really, not for community definitions (which should just carry named definitions of groups of community members).

Hope this helps others tinkering with this stuff on JUNOS…

An IGP metric strategy to please the latency sensitive

Recently I was discussing with a friend his strategy for ISIS metrics, and I found it interesting/useful enough to implement it on my own network. What it came down to was the idea that you would want to ensure that you could have multiple links between two locations taking very different paths with different latencies – but still be able to prefer a high capacity link with the lowest possible latency path.

One option in these cases is just fudging it – depending on the scale of your network this will be doable, however means constant manual intervention and rearchitecture of your IGP costing stategey. What I’ve come up with is (I believe) far better – in that it’s a mathematical formula which can be given to any engineer to work out the cost for the new core link you have assigned him to bring up. The fomula I came up with is as follows;

metric = latency * 1000 / speed

This ensures that while always having a strong bias towards higher capacity links, as is needed when passing large volumes of  traffic. I chose to measure latency in terms of RTT and speed in terms of gigs, ensuring that this formula would scale to 100GigE bearers. What this means is that while a 100GigE bearer over 5ms would end up with a metric of 50, to achieve an equal cost 10gig link it would have to have a latency of 0.5ms (i.e. 1/10th the distance). However, within links of the same speed, the path with the lowest latency would be preferred. Both the friend I was discussing this with and myself felt that this provided a really good balance between the two metrics in order to ensure good service to customers, and we’ve both implemented something based on this in our networks since then.

NOTE – Since writing this I’ve been discussing this with another friend who is thinking of writing some awesome Junoscript for a customer to automatically adjust their IGP metrics based on a similar formula. What they’re roughly thinking (which I think is rather cool) is that they will base this on RPM (for those from a Cisco background this is IP SLA for Juniper) results and adjust whenever they see a significant change. There’s positives and negatives to this, but given the use case that my friend is talking about here is a requirement to avoid high latency links (where his links are resilient and can change in latency due to breaks in his service provider’s network) it’s quite a neat strategy to address this provided the appropriate checks are put in place to prevent a constantly changing IGP topology and ensure that change only happens when absolutely needed.