RPKI on Junos is easy!

My friend Michael Fincham did a great presentation a few weeks ago at NZNOG on RPKI. I spent a fair bit of time helping him get said presentation ready to go – as we did a bunch of testing on the MX implementation of RPKI. He made a good argument that as a community we are horribly bad at the security of our prefixes. We need to be ensuring we do more than just blindly trust our peers and check that they actually have the right to send us the prefixes they advertise. Of course we are all at some level aware of this, but this verification stuff is hard right… right?….surely?!?…

Michael and I spent some time prior to him doing his presentation playing around with the RPKI on a MX80 in my lab. One of the things that came out of this is how truly easy it is to get a basic RPKI deployment going. To  turn on validation in Junos only requires the following single line of config;

set routing-options validation group some-awesome-rpki-server session

Of course there are a bunch of other options available, such as the priority of different RPKI servers, timeouts, port numbers, source address, etc – but this is not complex stuff to configure….

At this point, you’ll start to see prefixes looking like this (if they are valid);

[email protected]_RPKI_DEMO_MX80> show route
[output omitted]         *[BGP/170] 1w3d 03:59:37, localpref 100
                      AS path: 23655 4648 2914 5511 3215 I, validation-state: valid
                    > to via ge-1/1/8.0
[output omitted]

Or this (if they are not valid);

[email protected]_RPKI_DEMO_MX80> show route
[output omitted]      *[BGP/170] 1w3d 04:01:05, localpref 100
                      AS path: 23655 4648 4134 I, validation-state: invalid
                    > to via ge-1/1/8.0
[output omitted]

Then you can start writing policy that matches on the following;

[email protected]# set policy-options policy-statement abc from validation-database ?  
Possible completions:
  invalid              Match for invalid database validation-state
  unknown              Match for unknown database validation-state
  valid                Match for valid database validation-state

The cool thing about the JUNOS implementation of this is that you take action based on policy – so you can do anything at all with this! This might just be attaching an “untrusted” community, or it might be doing more.

Once concern I had that we spent some time playing around with before his presentation was that verifying each route with RPKI would make the MX take longer to process routes. So we did a bit of testing. We had a MX80 in my lab (and we all know that the MX80 does not have a particularly awesome CPU!) with a full BGP feed from my network. We measured the time it took for the route-table to be populated, plus the time for it to push these routes to the forwarding-table before and after implementing RPKI (with a route policy applied on import from the BGP feed inspecting RPKI attributes). The result was that the load time for the full table increased by 300-400%. This is bad, but not unmanageable – especially if you are not validating the full table – but just a few specific peers.

If you are going to set this up in your network, you are likely going to want to use a local RPKI server. A good place to start getting info on some of this stuff would be the slide deck from Michael’s talk, which can be found here; http://hotplate.co.nz/archive/nznog/2014/rpki/

Please also see the video of the talk here; http://www.r2.co.nz/20140130/michael-f.htm

Another bit of interesting reading is about the deployment of RPKI on the IX in Ecuador of all places! Read here; http://iepg.org/2013-11-ietf88/RPKI-Ecuador-Experience-v2b-1.pdf

One of the key things to note with RPKI is that while it is far from the final solution at this point (few have their routes signed yet), it’s a very good step in the right direction. And for that reason, I think we should all be looking into at least validating our routes, and perhaps assigning a better local-preference to validated routes. Like IPv6, it takes all of us ‘buying in’ to get this to the point where a large proportion of the internet routing table can be successfully validated!

Don’t forget the rest of your loopbacks!

With all that is going on on the internet currently around NTP reflection attacks and the like, it seemed timely to do a post on the logic of how router-protect filters are applied to loopbacks in JUNOS.

For those of you new to using Juniper gear, if you apply a firewall filter inbound on the loopback of a Juniper networks device, this will be applied to all traffic processed by the routing-engine. This includes traffic with a destination address of a physical interface (i.e. not the loopback). This provides a simple and convenient place to deploy firewall filters to protect the routing-engine on the Juniper device.

This generally looks something like this (where re-protect has the rules for what should talk to the RE);

set interfaces lo0 unit 0 family inet filter input re-protect

This includes VRF/Virtual Router interface traffic for VRFs/ Virtual routers that do not have their own loopback interfaces.

The catch that many people I have been helping over the last week have forgotten however, is the fact that this does not apply to traffic in VRFs or virtual-routers that have their own loopback. If the VRF or virtual-router has a loopback interface in it, you must apply the filter to this loopback as well for it to take effect. For example;

set interfaces lo0 unit 504 family inet filter input re-protect

The classic example where you may strike this is that you will generally require loopback interfaces in any VRF in which you wish to land BNG PPPoE subscribers on the MX routers.

However, a better way to implement firewall filtering to protect the routing engine would actually be to implement it in an apply group, in order that all future loopback interfaces are protected without any configuration being required. This could be done like so;

set groups re-protect interfaces lo0 unit <*> family inet filter input re-protect
set apply-groups re-protect

The only catch with deploying it like this is that if you ever do explicitly configure an input filter on a loopback unit directly (i.e. not through the apply-group to all), the group will cease to have any effect on this loopback (as it will see the group as having been overridden with local config).

Hope this all helps!

Regular expressions in JUNOS show commands

Most of you will be aware that you can use regular expressions to transform/filter the output of any command in JUNOS with a show blah | match “ge-*/0/[0-9]”, the one catch with this method is that it kills the headers for each column of output.

There’s actually another way to achieve this, which is to write the regexp directly into the show command. You can actually do this with most show commands in JUNOS (though the example below is a show interfaces, i’ve also tried this on ISIS, RSVP and OSPF output). The advantage of this way is that it preserves any information that is not related to a specific interface, making the command clearer to read.

To do this, just write any interface-specific command and put the regexp at the end. For example show rsvp interfaces ae* will show you all the RSVP AE interfaces. Or the below output will show you only the gigabit interfaces that match ge-*/0/* (i.e. over two line cards, but one set of interfaces within the line card);

[email protected]> show interfaces terse ge-*/0/* 

Interface               Admin Link Proto    Local                 Remote
ge-0/0/0                up    down
ge-0/0/1                up    down
ge-0/0/2                up    down
ge-0/0/3                up    down
ge-0/0/4                up    down
ge-0/0/5                up    down
ge-0/0/6                up    down
ge-0/0/7                up    down
ge-0/0/8                up    down
ge-0/0/9                up    down
ge-2/0/0                up    up
ge-2/0/1                up    up
ge-2/0/2                up    up
ge-2/0/3                up    up
ge-2/0/4                up    up
ge-2/0/5                up    down
ge-2/0/6                up    up
ge-2/0/7                up    up
ge-2/0/8                up    down
ge-2/0/9                up    up

I’ve explained this to a few people this week, so figured it would be a good thing to post here in case anyone hasn’t already picked this up.

Hope this helps!

LSP mappings based on route/traffic attributes

A friend today asked me an interesting question (that is in fact a part of the JNCIE-SP syllabus) – “How can I ensure that certain traffic types take different paths in my MPLS network?”

This is applicable for many of us who run large backhaul networks with many paths in them – some higher latency, some higher capacity. In these cases it is important to be able to load balance traffic based on many different requirements, first the available capacity, but also at times based on the traffic type.

A good example of where these requirements might become important is found in New Zealand – we have a single cable system out of NZ, which connects in a ring between two points in each of NZ, Australia & the USA (see below diagram). What this results in is that between NZ and the USA there are two paths: one direct, and one that goes via AU (and is 30ms longer). Both links are unprotected in themselves, but if you have a path on each, you can assume that one out of the two paths will be up at any time. You therefore don’t want to have too much more traffic than a single link could cope with, but at times you’ll oversubscribe a tad. If you wanted to oversubscribe it would make sense to ensure that latency sensitive traffic is on the short path and best effort traffic is on the long path.

The friend with whom I was discussing this was wanting to ensure that UDP traffic destined to routes in a certain as-path was transited on the long path between the USA and NZ. In this blog post we will discuss how we would achieve this on a Juniper MX router, and walk through how to achieve this example in a step by step manner.

We can map traffic to LSPs based on a range of criteria including the CoS queue, designation prefix, destination prefix as-path, and a few other useful properties (basically any information about the destination route that you can match with policy or the QoS class the traffic is in).

The one catch is that the route-preference for all LSPs you are wanting to map traffic to must be both equal and the best possible route-preference to the destination. If this is not the case, traffic will simply be sent to the best preference LSP. Likewise – if one LSP is unable to stand up and is withdrawn from the routing table, traffic mapped to this LSP will just move to another LSP.

Below is a diagram showing an example network which illustrates the problem my friend had. I have illustrated the primary TE path of two LSPs – the RED LSP which takes a direct path from the USA to NZ, and the BLUE LSP which takes a longer path via AU to NZ. The goal will be to ensure that all UDP traffic from our US transit provider that is destined to as900 and its customers will take the long path, while all other traffic takes the short path. Criteria for sending via each link is also illustrated on the diagram.

Dagram of example network

Dagram of example network

Okay – got the requirements – tell me how we do this!!!!!

We’ll configure this in a few steps, breaking down what we are doing and why as we go.

1/ Classify UDP traffic so that we can match based on this later
The first thing we identify is that one of the requirements dictates that we peer inside the header of every packet to classify it as UDP or non-UDP. In order to do this we will need to use a firewall filter on ingress to our router. We will use this to classify the traffic into a CoS class (which we can then use to map to a different LSP when we match the destination we want).

The first thing we must do in this step is create our new CoS class. Let’s call this class the “udp-class” class, while we leave the “best-effort” and “network-control” classes we had already configured in place;

[email protected]# show | compare

[edit class-of-service forwarding-classes]
     queue 3 { ... }
+    queue 1 udp-class;

Now that we have this, we must build a firewall filter to match UDP traffic vs other traffic;

[email protected]# show | compare
+  firewall {
+      family inet {
+          filter transit-in {
+              term udp {
+                  from {
+                      protocol udp;
+                  }
+                  then {
+                      forwarding-class udp-class;
+                      accept;
+                  }
+              }
+              term other-traffic {
+                  then accept;
+              }
+          }
+      }
+  }

Finally, we need to apply this filter to inbound traffic on the interface connecting to our US transit provider;

[email protected]# show | compare 
[edit interfaces ge-0/0/3 unit 0 family inet]
+       filter {
+           input transit-in;

At this point, we have all UDP traffic from our transit provider mapped to our “udp-class” CoS queue, and we are now ready to create a policy to make forwarding decisions based on this.

2/ Create a policy to make next-hop decisions based on CoS queue

In this step, we will create a CBF (CoS Based Forwarding) policy, which will (when called upon by route policy) install a next-hop LSP based on the specified forwarding class.

This is done as follows;

[email protected]# show | compare 
[edit class-of-service]
+   forwarding-policy {
+       next-hop-map NZ-Traffic {
+           forwarding-class udp-class {
+               lsp-next-hop BLUE;
+           }
+           forwarding-class best-effort {
+               lsp-next-hop RED;
+           }
+           forwarding-class network-control {
+               lsp-next-hop RED;
+           }
+       }
+   }

It is worth re-noting that the LSPs must be equal route preference (see more detail above) – I’ve seen lots of people miss this and wonder why their CBF policy is not working.

Additionally, the astute reader will note that I have not actually created a policy for the assured-forwarding queue, which is created by default on the MX as queue 2. In this case we will assume that no traffic is passing in this queue, however if any traffic is passed in a queue that is not defined in a CBF policy, it is mapped in the same manner as queue 0 (in this case best-effort). If queue 0 is not defined, one of the defined queues is selected at random use for non-defined queues.

At this point we have our CBF policy all sorted and are ready to proceed to the next step.

3/ Find the destinations we want this policy applied to

We must now find the destinations we want this policy applied to. In our case, this is to be all prefixes destined to as900 and it’s customers. This is best described in a regular expression as “900+ .*” (one or more iterations of as900 followed by any number of other AS numbers).

We can verify that this will work with the following command (note that I only have the two prefixes shown in the diagram set up behind the NZ router in this lab);

[email protected]# run show route aspath-regex "900+ .*" 

inet.0: 16 destinations, 16 routes (16 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both       *[BGP/170] 00:16:23, localpref 100, from
                      AS path: 900 800 700 I
                      to via ge-0/0/1.0, label-switched-path BLUE
                    > to via ge-0/0/2.0, label-switched-path RED

inet.3: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)

mpls.0: 4 destinations, 4 routes (4 active, 0 holddown, 0 hidden)

We now configure this on our USA router;

[email protected]# show | compare
[edit policy-options]
+   as-path as900_and_customers "900+ .*";

Cool! We now have a way to match both parts of the criteria on which we wish to match traffic. All that is left to do is to put it all together.

4/ Putting it all together and writing some route policy

In previous steps we have created mechanisms to match the traffic type and destinations for which we want to send traffic via our “long path”. Now we need to create some policy based on this to make it all work!

We want this policy to match traffic destined prefixes for as900 and customers (defined above) that already has a next hop of the NZ router. For all traffic that is matched we need to then define the next-hop based on our CBF policy defined above (in order that only UDP traffic is sent via the long path (BLUE LSP)).

For all other traffic with a next hop of the NZ router, we want to map it to the short path (RED LSP).

The following policy will do the trick nicely;

[email protected]# show | compare 
[edit policy-options]
+   policy-statement map-nz-lsps {
+       term as900_and_customers {
+           from {
+               neighbor;
+               as-path as900_and_customers;
+           }
+           then {
+               cos-next-hop-map NZ-Traffic;
+               accept;
+           }
+       }
+       term other-nz {
+           from neighbor;
+           then {
+               install-nexthop lsp RED;
+               accept;
+           }
+       }
+   }

It’s worth noting again that LSPs must have an equal route preference (and there must be no better preferences route) for install-nexthop to work (as with CBF policy) – see more detail above.

Finally we need to apply this policy to all routes being exported from the route-engine to the forwarding-engine. This requires one further line of configuration, and is done as follows;

[email protected]# show | compare 
[edit routing-options]
+   forwarding-table {
+       export map-nz-lsps;
+   }

We now have a peek at a prefix which matches the criterion of each of the two terms, starting with;

[email protected]> show route forwarding-table matching 200/24 
Routing table: default.inet
Destination        Type RtRef Next hop           Type Index NhRef Netif       user     0                    indr 262143     2
                                                 idxd   551     2
                   idx:1          Push 299920   572     2 ge-0/0/1.0
                   idx:3           ucst   573     4 ge-0/0/2.0
                   idx:xx           ucst   573     4 ge-0/0/2.0

We can see from the above output that it has created a per queue mapping for queues 1, 3 and a default mapping (matching queue 0’s configuration). So all working as expected.

And now for;

[email protected]# run show route forwarding-table matching 100/24    
Routing table: default.inet
Destination        Type RtRef Next hop           Type Index NhRef Netif       user     0                    indr 262142     2
                               ucst   573     3 ge-0/0/2.0

We can see from the output that other traffic is being mapped to the RED LSP (i.e. the short path) – exactly what we wanted.

5/ Testing

We now want to verify this by generating some traffic from the “Transit provider in the USA” – which in this lab is represented with a CentOS box. We need to test three scenarios;

A/ Traffic destined for 100/24
In this test, I will generate some ICMP echo requests from the CentOS box representing the transit provider to 100/24. If our lab is working correctly, I would expect to see this take the RED LSP (the short path).

Let’s clear the LSP stats, run the ICMP echo requests, then re-examine the LSP stats;

[email protected]# run clear mpls lsp statistics
[[email protected] ~]# ping -i 0.01 
PING ( 56(84) bytes of data.

--- ping statistics ---
5241 packets transmitted, 0 received, 100% packet loss, time 52754ms
[email protected]# run show mpls lsp statistics ingress 
Ingress LSP: 2 sessions
To              From            State     Packets            Bytes LSPname        Up              0                0 BLUE        Up           4890           410760 RED
Total 2 displayed, Up 2, Down 0

Great! As expected, traffic is being sent via the RED (short path) LSP.

B/ non-UDP traffic destined for 200/24

In this test, I will generate some ICMP echo requests from the CentOS box representing the transit provider to 200/24. If our lab is working correctly, I would expect to see this take the RED LSP (the short path).

Let’s clear the LSP stats, run the ICMP echo requests, then re-examine the LSP stats;

[email protected]# run clear mpls lsp statistics
[[email protected] ~]# ping -i 0.01 
PING ( 56(84) bytes of data.

--- ping statistics ---
1447 packets transmitted, 0 received, 100% packet loss, time 14581ms
[email protected]# run show mpls lsp statistics ingress    
Ingress LSP: 2 sessions
To              From            State     Packets            Bytes LSPname        Up              0                0 BLUE        Up           1447           121548 RED
Total 2 displayed, Up 2, Down 0

Again traffic is transiting the RED (short path) LSP as expected.

C/ UDP traffic destined for 200/24

In this final test, I will generate some UDP iperf traffic from the CentOS box representing the transit provider to 200/24. If our lab is working correctly, I would expect to see this take the BLUE LSP (the long path).

Let’s clear the LSP stats, run the iperf, then re-examine the LSP stats;

[email protected]# run clear mpls lsp statistics
[[email protected] ~]# iperf -c -u -b 10m -t 30
Client connecting to, UDP port 5001
Sending 1470 byte datagrams
UDP buffer size:  107 KByte (default)
[  3] local port 46896 connected with port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-30.0 sec  35.8 MBytes  10.0 Mbits/sec
[  3] Sent 25511 datagrams
[  3] WARNING: did not receive ack of last datagram after 10 tries.
[email protected]# run show mpls lsp statistics ingress    
Ingress LSP: 2 sessions
To              From            State     Packets            Bytes LSPname        Up          25521         38332542 BLUE        Up              0                0 RED
Total 2 displayed, Up 2, Down 0

This is taking the long path. All working as expected!

In this article I have attempted to describe how to select LSPs based on traffic and destination route properties. While I’ve used the criterion of UDP traffic aimed at a certain destination, you could of course implement this based on any combination of CoS queue, destination route attributes & traffic properties. This should be a really useful tool to have up your sleeve to meet TE requirements on your network – and something worth knowing for the JNCIE-SP exam.

One thing to note is that while in this article I quickly whipped up an extra CoS queue without any further configuration, beware of doing this in real life – you should to define buffer and transmit rates for schedulers on all interfaces for each queue. I will aim to do another blog post soon digging into this deeper – but for now just a warning (and have a look at the O’Reilly MX book if you want more detail on MX CoS)!

Thanks to Barry Murphy for coming up with this interesting scenario for me to write a post about. Hope this helps!

A quick hack to find busy interfaces on a MX!

Today at work we were attempting to locate one unit out of around 4,000 to find where a large amount of traffic on our network was headed. The monitoring tools pointed at this device were… shall we say… less than awesome for these tasks ;). So, my co-worker Tim Woolford (@insertbird on twitter), came up with this useful bash script to find any queues that were dropping more than X packets per second (in this case, we have configured the limit to 1pps – see the limit variable if you want to change this). We figure that this is something that others may find useful, so here’s the script. It’s quick and dirty, but does the trick. Here’s some sample output from it (note that the “Customer ABC” is the configured description on the unit);

[email protected]:~$ ./ddos-mx.sh - xe-1/3/1.10046 dropping 14 pps in forwarding queue 1 - Customer ABC

This utilises the ‘JUNIPER-COS-MIB’ SNMP functionality and also the IF-MIB to resolve interfaces. Hopefully this is of use to others who want a quick way to find that one busy interface on a box etc.


snmpwalk -v 2c -c $community $host . |grep -v 0$ | while read line
	ifIndex=`echo $line | sed -e 's/^.*'|cut -d '.' -f 1`
	pps=`echo $line | sed -e 's/^.*Counter64: //'`
	class=`echo $line |sed -e 's/^.*[0-9]\{1,5\}.//' -e 's/ = Counter64.*$//'`
	ifName=`snmpwalk -v 2c -c $community $host .$ifIndex |sed -e 's/^.*STRING: //'`
	echo $ifName | grep \\. > /dev/null
	if [ $? = 0 ]; then
		if [ $pps -gt $limit ]; then
			desc="`snmpwalk -v 2c -c $community $host .$ifIndex |sed -e 's/^.*STRING: //'`"
			echo $host - $ifName dropping $pps pps in forwarding queue $class - $desc

Of course this touches a fraction of what we could achieve with Juniper SNMP and scripts polling this, but as a simple hack, this goes a long way! You will need snmpwalk whichever box you are running this from in order to use this. All credit for this post goes to Tim Woolford (@insertbird).

MX80 mac addressing gotcha

The other day I was doing a migration where I had a MX80 in Australia on which I had  to move a handoff to an IX from one of the onboard 10gigE ports (xe-0/0/*) to a 2x10gig MIC card (xe-1/*/*) which was being couriered to the site (the migration was being done to move onto an interface which had access to the trio QX chip (which does the fine-grained per-unit/hierarchical scheduling/shaping)). The plan was that a contractor would install the card then immediately physically move the cable. The catch here was that the IX in question did mac-filtering on every port and I had to pre-provide then with the new mac.

Not wanting to have to disturb anyone from the IX operations team at 4am when I was planning on getting the work done, before assigning this piece of work to on of my guys to get the migration done I shoved the 2x10gig MIC card I was about to put in the courier into a MX80 in the lab and grabbed the mac address of the first 10gig port on this particular MIC card (or so I thought) then proceeded to email it off to the ops team of this IX;

[email protected]> show interfaces xe-1/2/0    
Physical interface: xe-1/2/0, Enabled, Physical link is Down
  Interface index: 186, SNMP ifIndex: 580
  Link-level type: Ethernet, MTU: 1514, LAN-PHY mode, Speed: 10Gbps, BPDU Error: None, Loopback: None, Source filtering: Disabled, Flow control: Enabled
  Device flags   : Present Running Down
  Interface flags: Hardware-Down SNMP-Traps Internal: 0x0
  Link flags     : None
  CoS queues     : 8 supported, 8 maximum usable queues
  Current address: 00:23:9c:f1:f2:b2, Hardware address: 00:23:9c:f1:f2:b2
  Last flapped   : 2013-11-16 16:55:10 NZDT (7w0d 05:20 ago)
  Input rate     : 0 bps (0 pps)
  Output rate    : 0 bps (0 pps)
  Active alarms  : LINK
  Active defects : LINK
  PCS statistics                      Seconds
    Bit errors                         15723
    Errored blocks                     15723
  Interface transmit statistics: Disabled

However, then at 4am when one of my engineers did the cutover we couldn’t pass traffic on the new interface. We soon figured out that the MAC had not followed the MIC interface card – it was tied to the MX not the MIC – therefore it had changed when we moved it to the other MX (and because we’d figured there was no point in paying for a contractor, we had only installed the MIC as we were doing the cutover so never managed to pre-verify this);

[email protected]> show interfaces xe-1/2/0 
Physical interface: xe-1/2/0, Enabled, Physical link is Up
  Interface index: 182, SNMP ifIndex: 625
  Description: REMOVED
  Link-level type: Flexible-Ethernet, MTU: 9192, LAN-PHY mode, Speed: 10Gbps, Loopback: None, Source filtering: Disabled, Flow control: Enabled
  Device flags   : Present Running
  Interface flags: SNMP-Traps Internal: 0x0
  CoS queues     : 8 supported, 8 maximum usable queues
  Schedulers     : 0
  Current address: 80:71:1f:c3:31:90, Hardware address: 80:71:1f:c3:31:90
  Last flapped   : 2013-11-01 06:45:31 NZDT (9w1d 15:33 ago)
  Input rate     : 26844760 bps (5296 pps)
  Output rate    : 32232088 bps (8543 pps)
  Active alarms  : None
  Active defects : None
  PCS statistics                      Seconds
    Bit errors                             0
    Errored blocks                         0
  Interface transmit statistics: Disabled

A quick call to wake up someone at the IX operator got us sorted and we were away with an only slightly longer outage time than we’d planned. However while the fact that the MAC of each interface is derived from the MX80 not the MIC interface card may be obvious to some, it was a gotcha for me, and hopefully this will be some valuable info for others. Hope this helps!