Fiddling with PVLANs

One of the topics in the JNCIE-ENT syllabus which I had no underststanding of at all before I started studying was Private VLANs. While this isn’t a huge topic – it’s something that I spent a bit of today ensuring I had my head around. While the rest of you all probably are very familiar with them, please bear with me as I document what I’ve discovered!

For PVLANs there are two different options. Firstly there are community VLANs – which is essentially a group of users that should be able to talk to all others in their “community” plus a set of servers / routers outside their community – but not the rest of the vlan. Second is isolation ports – which are ports that should only be able to talk to routers/servers and no other users.

I’ve based this blog (and my labbing today) off the below topology. Note that the “traffic mirror switch” runs a QinQ tunnel between the two switches with a port mirror sending traffic ingressing the two ports facing the other switches to the sniffer. Config for this is as follows;

[email protected]# show | display set | match "analyzer|QinQ|ge-0/0"
set interfaces ge-0/0/0 description EX2
set interfaces ge-0/0/0 unit 0 family ethernet-switching port-mode access
set interfaces ge-0/0/1 description EX1
set interfaces ge-0/0/1 unit 0 family ethernet-switching port-mode access
set interfaces ge-0/0/2 description LAPTOP
set interfaces ge-0/0/2 unit 0 family ethernet-switching port-mode access
set ethernet-switching-options analyzer JNCIE input ingress interface ge-0/0/1.0
set ethernet-switching-options analyzer JNCIE input ingress interface ge-0/0/0.0
set ethernet-switching-options analyzer JNCIE output interface ge-0/0/2.0
set vlans QinQ vlan-id 123
set vlans QinQ interface ge-0/0/1.0
set vlans QinQ interface ge-0/0/0.0
set vlans QinQ dot1q-tunneling customer-vlans native
set vlans QinQ dot1q-tunneling customer-vlans 1-4094
set vlans QinQ dot1q-tunneling layer2-protocol-tunneling all

Also note that all the hosts have been emulated with a SRX that’s running a routing-instance per host.

And here is the topology;

PVLAN testing lab topology

PVLAN testing lab topology

Configuring this is fairly easy. Firstly, you configure a master vlan on the switches, which everything else nominally sits within. To do this in JUNOS, simply configure the vlan with a vlan ID, and configure any inter-switch ports as trunk ports (with the pvlan-trunk option in the vlan configuration);

[email protected]# show | display set | match "MASTER|0/0/23"
set interfaces ge-0/0/23 description MIRROR-SWITCH
set interfaces ge-0/0/23 unit 0 family ethernet-switching port-mode trunk
set vlans PVLAN_MASTER vlan-id 100
set vlans PVLAN_MASTER interface ge-0/0/23.0 pvlan-trunk

If you want to make a “promiscuous” port – (generally used for routers or servers) which can transmit traffic to every isolation port and community vlan member, it’s quite simple, you just add it to the vlan as a non pvlan-trunk trunk port in the vlan;

[email protected]# show | display set | match 0/0/5
set interfaces ge-0/0/5 description SERVER
set interfaces ge-0/0/5 unit 0 family ethernet-switching port-mode trunk
set vlans PVLAN_MASTER interface ge-0/0/5.0

Right – now for the clients. Lets start with the community vlans;

Lets say that you want to make the SALES Community vlan, with tag 20 representing this subset of users within the PVLAN on one of the switches;

[email protected]# show | display set | match SALES
set interfaces ge-0/0/0 description SALES2
set vlans SALES vlan-id 20
set vlans SALES interface ge-0/0/0.0
set vlans SALES primary-vlan PVLAN_MASTER

Once this is done, each SALES host will be able to ping SERVER host plus the other SALES host. Seems to work well.

Having done this, I was interested to know what was actually happening when I was moving traffic between the two hosts, so used my sniffer to have a look. I sent an ICMP request from SALES2 to SALES1 while sniffing the traffic passing between the switches;

[email protected]#run ping 192.168.66.103 source 192.168.66.100 routing-instance SALES2 count 1 rapid
PING 192.168.66.103 (192.168.66.103): 56 data bytes
!
--- 192.168.66.103 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.372/1.372/1.372/0.000 ms

What this resulted in was the following;

19:29:12.157049 54:e0:32:ef:1a:80 (oui Unknown) > 54:e0:32:ef:1a:83 (oui Unknown), ethertype 802.1Q (0x8100), length 102: vlan 20, p 0, ethertype IPv4, 192.168.66.100 > 192.168.66.103: ICMP echo request, id 16716, seq 0, length 64
19:29:12.157092 54:e0:32:ef:1a:83 (oui Unknown) > 54:e0:32:ef:1a:80 (oui Unknown), ethertype 802.1Q (0x8100), length 102: vlan 20, p 0, ethertype IPv4, 192.168.66.103 > 192.168.66.100: ICMP echo reply, id 16716, seq 0, length 64

So as you can see, traffic transmitted on the pvlan-trunk which is in the PVLAN Community for Sales gets the SALES tag of 20, not the PVLAN_MASTER tag of 100. This makes sense. However I next sent an ICMP request to from SALES2 to SERVER and got a slightly more interesting result;

[email protected]#run ping 192.168.66.105 source 192.168.66.100 routing-instance SALES2 count 1 rapid
PING 192.168.66.105 (192.168.66.105): 56 data bytes
!
--- 192.168.66.105 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.643/1.643/1.643/0.000 ms

This resulted in;

19:30:53.330235 54:e0:32:ef:1a:80 (oui Unknown) > 54:e0:32:ef:1a:85 (oui Unknown), ethertype 802.1Q (0x8100), length 102: vlan 20, p 0, ethertype IPv4, 192.168.66.100 > 192.168.66.105: ICMP echo request, id 16720, seq 0, length 64
19:30:53.330333 54:e0:32:ef:1a:85 (oui Unknown) > 54:e0:32:ef:1a:80 (oui Unknown), ethertype 802.1Q (0x8100), length 102: vlan 100, p 0, ethertype IPv4, 192.168.66.105 > 192.168.66.100: ICMP echo reply, id 16720, seq 0, length 64

The interesting thing to note here is that ICMP request from the host in the C-VLAN is transmitted over the trunk as tag 20 (the SALES community vlan-id), and the response comes back on tag 100 (the PVLAN_MASTER vlan-id). This makes sense, however was different to what I had originally assumed was going to happen.

Next I tested isolation ports. This was pretty easy to configure as it turned out. Firstly you had to tell the master vlan that it was not to switch traffic directly between access ports;

set vlans PVLAN_MASTER no-local-switching

Then you had to configure each edge port as an access port like this;

[email protected]# show | display set | match 0/0/6
set interfaces ge-0/0/6 description ISOLATED-EX1
set interfaces ge-0/0/6 unit 0 family ethernet-switching port-mode access
set vlans PVLAN_MASTER interface ge-0/0/6.0

It’s worth noting however that in this configuration traffic is only isolated when it stays on the local switch. At this point I could not ping between hosts “Isolated A” and “Isolated B” which are on EX2, but could ping from either of them to the “Isolated” host on EX1. This makes sense as at this point they’re just having the standard tag added to them (100) as they cross the pvlan-trunk ports, so there’s no way of the switches telling each other this is “special” traffic. To resolve this, add an isolation vlan id, which can be used to identify this “isolated” traffic between switches;

[email protected]# show | display set | match isolation
set vlans PVLAN_MASTER isolation-id 30

At this point I could not ping between any of the isolation ports, however could ping to the server. Exactly what I wanted. Now I was interested to see how the tagging worked. I generated an ICMP request from “Isolated A” on EX2 to the “Server” on EX1;

[email protected]#run ping 192.168.66.105 source 192.168.66.102 routing-instance ISOLATED-EX2-A count 1 rapid
PING 192.168.66.105 (192.168.66.105): 56 data bytes
!
--- 192.168.66.105 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max/stddev = 1.695/1.695/1.695/0.000 ms

The result was this;

19:40:28.924744 54:e0:32:ef:1a:82 (oui Unknown) > 54:e0:32:ef:1a:85 (oui Unknown), ethertype 802.1Q (0x8100), length 102: vlan 30, p 0, ethertype IPv4, 192.168.66.102 > 192.168.66.105: ICMP echo request, id 16730, seq 0, length 64
19:40:28.924892 54:e0:32:ef:1a:85 (oui Unknown) > 54:e0:32:ef:1a:82 (oui Unknown), ethertype 802.1Q (0x8100), length 102: vlan 100, p 0, ethertype IPv4, 192.168.66.105 > 192.168.66.102: ICMP echo reply, id 16730, seq 0, length 64

So (in a manner consistent with the behaviour for community vlans) the ICMP request from the Isolated port to the Server is put on the trunk with the tag that is used as the “isolation ID” – indicating that this traffic is destined for “promiscuous” ports only, then returns with the vlan-id of the PVLAN_MASTER vlan.

A couple of final points to be aware of with PVLANs;

  • PVLANs cannot be used in conjunction with Voice vlans on ports
  • If you do actually want to allow traffic to pass between the different communities and isolation ports, enable proxy-arp on one or more router(s) connected to promiscuous ports. Traffic will then transit these port(s) when passing between them. Note that firewall filtering can be used on the routers offering the proxy-arp to control traffic between communities / isolation ports.

Hope this helps!

JNCIE-ENT beta exam booked!

In the last couple of weeks, I was officially accepted into the JNCIE-ENT beta programme and have been making arrangements to travel over to the US to sit the exam on February 22nd 2014.

What does this mean? Juniper have been working on a new version of the JNCIE-ENT lab exam. While significant amounts of work is put into getting it right first time, with an exam of the scale and complexity of a JNCIE lab exam, there’s always going to be some stuff that you need others to verify and validate is correct. This is where the JNCIE-ENT beta comes in – select candidates are given the opportunity to sit the exam (for no charge) in the form the certification team have it in at that point. Generally you can expect that there will be some issues/errors with the exam, and therefore you are required to have a very in-depth understanding of the technologies you are working with to pass it – as you’ll need to be able to (on top of the normal stress of the real exam) identify issues with the actual exam – identify them to the proctor & move on to get all the tasks done. Friends who did this a couple of years ago have indicated that there were far more tasks to do in the period you get to complete them than  in the finished version!

This excites me, as I’m always up for a good challenge. One hitch that I’ve run into is that while work were willing to fund my JNCIE-SP, they are not willing to contribute to the JNCIE-ENT (to either the costs or the time off), so I’m paying for this one myself and I’m going to have to take some of my paid annual leave to do this :(. However at the end of the day I really want this certification and am excited about the lab, so am going to do it anyway.

I’m going to have a lot less time to complete this one than I did for the JNCIE-SP (I spent 5 months on the last one), but having said that I have the benefit of the fact that there is a lot of crossover in the routing syllabus which I understand really well after doing the JNCIE-SP.

So I’ve begun my old study routine of doing at least 40 hours a week of solid study towards this. I’m going to attempt to fit in time to continue at least some blogging in this time (I doubt I’ll be short of ideas for posts, learning about a whole lot of new technologies such as 802.1X, mac auth, and the VoIP specific stuff in the IE), plus I’ll still have my day to day work to complete as well.

The books I am going to be working from in the immediate future are as follows;

  • Proteus JNCIE-ENT workbook
  • InetZero JNCIE-ENT lab book
  • Junos Enterprise Switching vol 2
  • Junos Enterprise Routing
  • AJEX course materials
  • AJER course materials
  • Various bits of Juniper documentation

I’ve also brought 35 lab rack sessions for the InetZero lab racks and plan to use them very extensively in the short amount of time I have. This will be my “full scale” lab for doing full lab type scenarios, however I’ve also got access to the following devices at work which I intend to use fairly extensively;

  • 1x SRX240
  • 1x SRX110
  • 1x EX2200
  • 3x EX4200
  • 4x MX80
  • 1x MX480
  • 2x EX4500
  • 1x EX4550
  • 2x EX3300

My rough thinking with the lab time I’ve brought from InetZero and the above kit is that I’m going to lab specific concepts in order to get a full understanding of them during the week in my weeknights (I will be aiming for 6 hours a night 4/5 nights a week). Then in the weekends I will do 2x 8 hour sessions on the InetZero lab per weekend (maybe even up to 3 depending on the week and if I feel like staying up till 4am on a friday night to do a third!). This should ensure I am regularly interacting with a full scale network to practice the tasks, as well as spending a lot of time labbing individual technologies and concepts as much as possible. I’ll be largely doing the InetZero lab book labs in the weekends (as that’ll be suited to their kit), and using Chris Jones’ Proteus JNCIE-ENT workbook during the weeks as the basis for my studies.

I’ve been through the syllabus, and so far I have identified (out of the syllabus here https://www.juniper.net/us/en/training/certification/resources_jncieent.html) the following items that I could use a significant amount of work on;

  • 802.1X
  • Mac auth
  • Captive Portal
  • RPM
  • Private Vlans
  • IP telephony features
  • DAI/DHCP snooping
  • Layer 2 fireawall filters
  • IPv6 Multicast
  • Ethernet OAM
  • GRE tunnels
  • A few of the advanced tricks around EX Virtual Chassis

To be honest I think that for a JNCIE this is a relatively small “unknown” list. I’m fairly happy that I’m in an excellent position with all of the routing aspects of the JNCIE-ENT (having just come out of studying for the JNCIE-SP), and am feeling very positive around the features not mentioned above.

I’d be interested to know if anyone has any thoughts on any of the following for the JNCIE-ENT;

  • Study plans
  • Ideas for lab setups
  • Any gotchas they can think of with any of the technologies covered
  • Any resources they found particularly worthwhile
  • Any suggestions at all for getting through a beta JNCIE lab (having never done one before).
  • Anyone else who is studying towards this who would like to be a “study buddy” and share ideas, lab plans, etc.

Any feedback anyone has would be much appreciated.

I couldn’t leave this though without mentioning my greatest “asset” in tacking “yet another JNCIE” – which is my wife – who has been incredibly supportive throughout the first JNCIE and continues to be as I embark on this next challenge. Without her support I would be eating takeout every night and living in a dump of a house while I got through this (and not really looking after myself) :).

Also, many thanks to Liz and the Certification team at Juniper for accepting me into this – I’m really looking forward to the experience!

Many thanks :).

LDP over RSVP

One of the topics I had to play with during my JNCIE study was LDP tunnelling over RSVP. This is in many ways a very cool technology for scaling large networks (but is not without its hitches). Loosely the principle is that you can run a RSVP mesh between all of your “core nodes” and LDP on your edge nodes, then encapsulate the LDP traffic inside a RSVP tunnel. This is really good in principle, however it makes a few assumptions. First, we’d assume in an ideal environment that your “core” or “P” nodes are not doing anything other than switching labels (no connections to large transit providers or anything like that) using RSVP-TE to ensure fast-failover & other goodness.

As mentioned above, in a perfect environment for this to work, you’re going to have a bunch of P nodes that just do dumb label forwarding, and a bunch of PE nodes that do all the interaction with other networks, customers, the internet, etc. The idea is that you run LDP on all these “edge” nodes right up to the “core” nodes for this region, then they encapsulate the LDP traffic to other “core” nodes that are tunnelling LDP. At this point you have network wide LDP connectivity to all of your edge nodes, however you have the benefit in your core network (which will often be over long-distance spans of fibre which are more likely to experience fibre cuts etc) of the quick failover and traffic engineering provided by RSVP. The great thing though is that you don’t need to run RSVP everywhere, so you can still benefit from the fact that LDP is a far less resource intensive (in terms of CPU time etc) label protocol.

The model scenario to run this in is something like the following illustration, where you have a core network running RSVP-TE, allowing you to ensure fast-failover and capacity reservations, while wanting to just run a simple (and less resource intensive) LDP mesh on the edge areas of your network;

An ideal deployment of LDPoRSVP in a network with separated P and PE routers, enabling the use of RSVP-TE in the core, while still allowing the resource savings offered by LDP at the edge.

An ideal deployment of LDPoRSVP in a network with separated P and PE routers, enabling the use of RSVP-TE in the core, while still allowing the resource savings offered by LDP at the edge.

Okay, so if we were to deploy this, what would it actually do? I’ve made an illustration of the data-plane forwarding-process below. In this diagram we are assuming that explicit-null is used at all points (i.e. no Penultimate POP).

In this scenario, from the perspective of LDP the two P’s that are speaking both LDP and RSVP are direct LDP neighbours (via the RSVP LSP). When a packet destined for the right PE enters the left PE, it first has a LDP signalled MPLS label pushed onto it then is  forwarded to the left P. Once it hits the left P has an additional RSVP signalled MPLS label pushed onto it and is forwarded through the RSVP LSP path to the right P. Finally at this point the right P pops the RSVP signalled label and it is forwarded with only a LDP signalled label to the right PE, at which point this label is popped and the packet is forwarded to the destination customer.

If we had not done this scenario with explicit-null assumed (i.e. with Penultimate POP behaviour turned on), the RSVP signalled label would have been popped at the middle P and the LDP signalled label would have been popped at the right P – i.e. traffic would have been forwarded along the final hop of each segment with the label removed – however it is easier to illustrate in a non Penultimate POP scenario.

An illustration of the MPLS forwarding process for LDPoRSVP. In this scenario we assume explicit-null is set on all LSRs

An illustration of the MPLS forwarding process for LDPoRSVP. In this scenario we assume explicit-null is set on all LSRs.

This seems all pretty simple right? Okay – let’s have a go at configuring it!

In order to configure this on a Juniper, you just configure LDP as normal on all MPLS interfaces you want to use LDP as the label protocol on;

set protocols ldp interfaces ge-x/x/x.yyy

For the nodes performing the actual LDP tunnelling, we need to also enable LDP on the loopback interface in order for it to perform this functionality (nominally the loopback is the interface that forms the adjacency with the other LSRs that you are LDP tunnelling to);

set protocols ldp interfaces lo0.0

Now what you want to do is enable LDP tunnelling on your RSVP LSPs between the “core” nodes that are doing the tunnelling (note that we assume the LSPs are already established here);

set protocols mpls label-switched-path A-to-B ldp-tunneling

At this point all should start working, however there’s a couple of things to watch for; first, make sure that you enable LSPs in both directions between LSRs doing the tunnelling- otherwise you could see some odd brokenness when LDP traffic only works in one direction and not the other between two nodes.

Additionally, it’s worth noting that one of the checks that all LDP routes must pass in order to be installed in inet.3 (the juniper route table for MPLS next-hops) requires the LDP next-hop for each specific prefix (note that it must be an exact match for the prefix) must match the next hop for the same prefix in inet.0. Section 3.5.7.1 of RFC5036 (LDP Specification) describes the requirement for this behaviour;

An LSR receiving a Label Mapping message from a downstream LSR for a Prefix SHOULD NOT use the label for forwarding unless its routing table contains an entry that exactly matches the FEC Element.

If you have an additional core link which doesn’t have LDP enabled (or a far more complex topology than the one I have attempted to describe here), you can run into some very serious problems. It’s worth noting that the reason for this is that the way LDP works is it builds a “reverse tree” which is rooted at the destination PE and has branches to every other PE for every single LDP route (which by default is all loopback addresses on LDP enabled devices). This is part of what makes LDP a really efficient routing protocol – instead of building a LSP in each direction per combination of PEs like RSVP does it builds a single tree based LSP from all sources to each destination – so there’s a lot less CPU time involved in scaling up.From a 10,000 foot view this is done in a manner similar to how you would build a multicast source tree, and therefore has requirements for similar checks.  It’s worth remembering that it will always be forced to follow the IGP best-path even if LDP is not enabled on this pathThe good news is that if you do have a backup link that doesn’t run LDP, you can just cost this up to ensure that you are not going to run into these issues.

I found some quite complex scenarios along these lines in some of the practice exams I did when I was preparing for my JNCIE-SP – requiring some very specific changes to be made to the IGP (or the addition of extra LSPs) in order to get things working. JUNOS is nice in that there’s some easy commands to check what’s happening show ldp routeshow route table inet.3 show route table inet.0 are all useful for this.

I’m going to spend some time illustrating this point, as it’s an important one to understand if you are planning to attempt to use this technology in any network beyond the most basic example shown in the “ideal deployment” illustration above. We’ll work from a network that looks like this;

The base network we will work from to explore some of the issues with inet.0 routes vs the LDP routes.

The base network we will work from to explore some of the issues with inet.0 routes vs the LDP routes.

In the above scenario, we have two areas of the network that talk LDP separated by R3 which is “RSVP only”. We also note that R1 does not participate in RSVP. This network requires MPLS reachability from R4, R5 & R6 to R1, therefore LDPoRSVP tunnelling has been implemented on RSVP LSPs between R2<—->R4 & R2<—->R5 in an attempt to provide this.

The configuration of this baseline scenario can be found Here (I have labbed this on a single MX10 in order that if anyone wants to have a play with this they are able to with minimal hardware).

Right, so first things first, let’s have a look at the RSVP LSPs and confirm they are up. The easiest router to do this on based on the diagram above is R2. The quickest way to do this is look at all the ingress and egress LSPs on R2;

[email protected]:R2> show mpls lsp 
Ingress LSP: 2 sessions
To              From            State Rt P     ActivePath       LSPname
192.168.0.4     192.168.0.2     Up     0 *                      R2-to-R4
192.168.0.5     192.168.0.2     Up     0 *                      R2-to-R5
Total 2 displayed, Up 2, Down 0

Egress LSP: 2 sessions
To              From            State   Rt Style Labelin Labelout LSPname 
192.168.0.2     192.168.0.5     Up       0  1 FF       3        - R5-to-R2
192.168.0.2     192.168.0.4     Up       0  1 FF       3        - R4-to-R2
Total 2 displayed, Up 2, Down 0

Okay, let’s now verify that the LDP neighbours are up over these LSPs;

[email protected]:R2> show ldp neighbor 
Address            Interface          Label space ID         Hold time
192.168.12.1       lt-1/0/10.21       192.168.0.1:0            12
192.168.0.4        lo0.2              192.168.0.4:0            32
192.168.0.5        lo0.2              192.168.0.5:0            35

All looking very promising. We then have a poke at R1 and confirm that it has a route in the inet.3 (where LSP next-hops are kept) for R4, R5 & R6;

[email protected]:R1> show route table inet.3 

inet.3: 3 destinations, 3 routes (3 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

192.168.0.2/32     *[LDP/9] 00:58:05, metric 1
                    > to 192.168.12.2 via lt-1/0/10.12
192.168.0.4/32     *[LDP/9] 00:49:16, metric 1
                    > to 192.168.12.2 via lt-1/0/10.12, Push 299792
192.168.0.5/32     *[LDP/9] 00:49:16, metric 1
                    > to 192.168.12.2 via lt-1/0/10.12, Push 299808

This isn’t looking so great –  while we have reachability to R4 & R5 – we don’t have reachability to R6. Let’s inspect inet.3 on R6;

[email protected]:R6> show route table inet.3    

inet.3: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

192.168.0.4/32     *[LDP/9] 00:54:54, metric 1
                    > to 192.168.46.4 via lt-1/0/10.64
192.168.0.5/32     *[LDP/9] 00:54:54, metric 1
                    > to 192.168.56.5 via lt-1/0/10.65

This also doesn’t look so good for MPLS forwarding between R1 and R6. We have a look at R6 and verify that LDP does in fact have a route for a LSP to R1;

[email protected]:R6> show ldp route 192.168.0.1/32    
Destination         Next-hop intf/lsp/table           Next-hop address
 192.168.0.1/32     lt-1/0/10.63                      192.168.36.3

For a moment we scratch our heads, until we remember the note mentioned earlier in this post about the requirement for an exact match with inet.0 We also notice at this point that LDP isn’t actually enabled on lt-1/0/10.63 – this is the link to R3 which does not support RSVP.  So it makes sense that this is not working.

We now have two options. The first (and simplest) is to increase the metric on the R3-R6 link.

[email protected]:R6# show | compare 
  [edit logical-systems R6 protocols isis interface lt-1/0/10.63]
-     level 2 metric 10;
+     level 2 metric 100;

[email protected]:R3# show | compare 
  [edit logical-systems R3 protocols isis interface lt-1/0/10.36]
-     level 2 metric 10;
+     level 2 metric 100;

At this point, our network is represented by the following diagram (with the IGP cost adjusted on the R3-R6 link);

Here, we resolve the issue by increasing the IGP cost of the R3-R6 link to force LDP to forward through a path that has LDP signalling enabled.

Here, we resolve the issue by increasing the IGP cost of the R3-R6 link to force LDP to forward through a path that has LDP signalling enabled.

We now have a look at the LDP route for R1 on R6;

[email protected]:R6> show ldp route 192.168.0.1/32                   
Destination         Next-hop intf/lsp/table           Next-hop address
 192.168.0.1/32     lt-1/0/10.64                      192.168.46.4
                    lt-1/0/10.65                      192.168.56.5

So far so good! Let’s now have a look at inet.3 to see if this has worked;

[email protected]:R6> show route 192.168.0.1/32 exact table inet.3    

inet.3: 4 destinations, 4 routes (4 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

192.168.0.1/32     *[LDP/9] 00:01:33, metric 1
                    > to 192.168.46.4 via lt-1/0/10.64, Push 299824
                      to 192.168.56.5 via lt-1/0/10.65, Push 299824

And now to check that it’s working in the other direction also;

[email protected]:R1> show route 192.168.0.6/32 exact table inet.3  

inet.3: 4 destinations, 4 routes (4 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

192.168.0.6/32     *[LDP/9] 00:04:05, metric 1
                    > to 192.168.12.2 via lt-1/0/10.12, Push 299824

All looking really good! So this is definitely a valid way to fix this issue. However the issue with the way we have fixed the issue is that we can now no longer use the R3-R6 link for normal traffic, which (depending on the network) may not be an option.

In this situation, an alternative way to resolve this would be to configure another RSVP LSP to tunnel over directly between R6 and R3. First, we’ll remove the fix we just put in place;

[email protected]:R3# show | compare 
 [edit logical-systems R3 protocols isis interface lt-1/0/10.36]
-     level 2 metric 100;
+     level 2 metric 10;

[email protected]:R6# show | compare 
  [edit logical-systems R6 protocols isis interface lt-1/0/10.63]
-     level 2 metric 100;
+     level 2 metric 10;

Now we’ll configure up the new LSPs;

[email protected]:R2# show | compare 
 [edit logical-systems R2 protocols mpls]
     label-switched-path R2-to-R5 { ... }
+    label-switched-path R2-to-R6 {
+        to 192.168.0.6;
+        ldp-tunneling;
+    }
[email protected]:R6# show | compare 
  [edit logical-systems R6 protocols mpls]
+    label-switched-path R6-to-R2 {
+        to 192.168.0.2;
+        ldp-tunneling;
+    }
[edit logical-systems R6 protocols ldp]
+    interface lo0.6;

Okay, let’s verify that they’re up;

[email protected]:R6> show mpls lsp 
Ingress LSP: 1 sessions
To              From            State Rt P     ActivePath       LSPname
192.168.0.2     192.168.0.6     Up     0 *                      R6-to-R2
Total 1 displayed, Up 1, Down 0

Egress LSP: 1 sessions
To              From            State   Rt Style Labelin Labelout LSPname 
192.168.0.6     192.168.0.2     Up       0  1 FF       3        - R2-to-R6
Total 1 displayed, Up 1, Down 0

And we should confirm that we have a LDP relationship over these new tunnelling LSPs;

[email protected]:R6> show ldp neighbor 
Address            Interface          Label space ID         Hold time
192.168.46.4       lt-1/0/10.64       192.168.0.4:0            12
192.168.56.5       lt-1/0/10.65       192.168.0.5:0            11
192.168.0.2        lo0.6              192.168.0.2:0            35

At this point, our network is best represented by the following diagram (with the additional LSPs);

Here, we resolve the issue by adding RSVPoLDP tunnelling LSPs between R3 and R6.

Here, we resolve the issue by adding LDPoRSVP tunnelling LSPs between R3 and R6.

We have a poke at the LDP routes on R6 and confirm that LDP sees the R6-R3 path as the best path to R1;

[email protected]:R6> show ldp route 192.168.0.1/32                   
Destination         Next-hop intf/lsp/table           Next-hop address
 192.168.0.1/32     R6-to-R2
                    lt-1/0/10.63                      192.168.36.3

Finally – the moment of truth – we confirm that we have inet.3 entires for R1 on R6 (and vice versa) and relax with the knowledge that MPLS forwarding between R1 and R6 is now working;

[email protected]:R6> show route 192.168.0.1/32 exact table inet.3    

inet.3: 5 destinations, 8 routes (4 active, 0 holddown, 3 hidden)
+ = Active Route, - = Last Active, * = Both

192.168.0.1/32     *[LDP/9] 00:02:41, metric 1
                    > to 192.168.36.3 via lt-1/0/10.63, label-switched-path R6-to-R2

[email protected]:R1> show route 192.168.0.6/32 exact table inet.3   

inet.3: 4 destinations, 4 routes (4 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both192.168.0.6/32     *[LDP/9] 00:03:52, metric 1
                    > to 192.168.12.2 via lt-1/0/10.12, Push 299840

So we’ve confirmed that this is another feasible way of solving our problem (and in the circumstances probably a better solution, as it doesn’t involve taking a link out of service). Hopefully the above two scenarios helped illustrate the limitation of LDP in this scenario I hilighted above (and some possible ways to address this).

One final issue that’s worth covering is that on the MX platform by default you can only stack 3 labels deep, and doing this adds a label to the stack. If you were running MPLS VPNs, or labelled routes of any kind over MPLS, you’re already using 2 MPLS labels on traffic, so this would add a third. The gotcha here is if you enable fast-reroute or use an InterProvider VPN C connection to another VPN provider, you’re going to be adding on more labels – which the MX won’t let you do by default. Fortunately, we can support up to 5 levels of labels on the MX – which would account for the simultaneously running all these together – with this command;

set interfaces ge-1/0/0 unit 100 family mpls maximum-labels 5

Note that the range for this command is 3-5, with 3 being the default.

When I did my study for my JNCIE-SP I could not find any good articles on this topic, so I hope this helps others either deploying it or studying towards certifications. LDP over RSVP tunnelling is a really useful feature for larger networks to help them scale, but it would be worth carefully considering if it’s worth the hassle and complexity before deploying it in most production environments.

JUNOS firewall filters – IPv6 next-header vs payload-protocol

A while ago, while doing some labbing for a CoS deployment I was working on, I struck an interesting issue on JUNOS firewall filters. The issue was that in IPv6 the protocol matching was not fully implemented!

A bit of background – while in IPv4, flags to indicate things like the fact that the packet is fragmented are built into the IPv4 packet header, in IPv6 this is a separate header (like the TCP or UDP headers) which sits between the IP packet and the TCP/UDP header.

In JUNOS firewall filters, you match IPv4 protocols with from protocol X (where X is TCP, UDP, etc), which is pretty straightforward. However for matching IPv6 protocols, it’s from next-header X (TCP, UDP, fragment, etc). The use of next-header represents what it actually is doing – it’s looking for the first header after the IPv6 packet – which is the key here – if the packet is a fragmented TCP packet, next-header will be fragment, while if it’s not fragmented next-header will be TCP.

This presents a problem if you are wanting to specifically match on the TCP protocol and not UDP, as if the packet is fragmented you have to decide to either allow all fragments or not allow them at all, which is a bit irritating.

However the good news is that Juniper released a new match condition in 11.4R6.6 for IPv6 – from payload-protocol X – this new feature addresses this issue. This funnily enough matches the payload protocol (as you would expect), skipping headers like the fragment header, and doing what you actually want it to do.

Hope this helps!

RSVP Optimization on JUNOS

Recently, I did a blog post (http://wp.me/p40PSL-k) on some of the things I’m playing with in my lab (and now deploying quite extensively) with RSVP – specifically around using this to ensure that traffic is shifted as links fill and ensuring that LSPs do not put protection paths on links that share a fibre duct with the primary path. A few people have asked me directly about how we do the optimisation of LSPs to ensure this works, so here goes!

By default in JUNOS, when a RSVP LSP is created (assuming CSPF is not disabled) it will follow the best IGP path subject to any CSPF constraints that have been put on it (link colouring, least/most fill, strict/loose hops, bandwidth reservations etc). However as you add (potentially) better links to your network and have path changes, unless a higher priority LSP being stood up/moved causes the LSP to have to leave the path it is on or there is a link/device failure across the path, it will not move onto the potentially better paths you have introduced around the network.

It’s worth making sure before you read further that you understand the CSPF path selection process. This is documented here; http://www.juniper.net/techpubs/en_US/junos10.4/topics/concept/mpls-cspf-path-selection-method.html

First, the simplest thing we can do is tell the LSR (Label Switch Router) to “optimize” the path the LSP takes every X seconds. This is done pretty simply with the following command (X is in seconds, and can be a value of up to 65535);

set protocols mpls optimize-timer X

Be careful with this – remember that your LSPs around the network will exponentially grow as you add more LSRs – setting this too aggressively will take up a fair bit of CPU time.

It’s also important to remember that if we want to  prevent the LSP double-counting both the path it is trying to cut onto and the old path on any links shared by both paths (using a Shared Explicit rather than a Fixed Filter reservation type), we need to turn on the “adaptive” knob in the LSP configuration. This also ensures that the cutover is done in a make-before-break manner;

set protocols mpls label-switched-path A-to-B adaptive

During switchover, the other knobs you can tune are to firstly tell it how long to wait (once the new path is ready to go) before switching the LSP over (using the “optimize-switchover-delay” option), and how long to leave the old path in place before tearing it down (using the “optimize-hold-dead-delay” option). Both are in seconds, optimize-hold-dead-delay can go up to 65535 seconds (default is 60), whereas optimize-switchover-delay is 1-900 seconds (default 1);

set protocols mpls optimize-switchover-delay X

set protocols mpls optimize-hold-dead-delay X

Finally, it’s worth noting that by default CSPF must see certain benefits (a combination of IGP metric, available bandwidth, hop count, etc) for it to accept a better path when reoptimizing. If it does not see these benefits it will ignore the new path and not re-route the LSP. To change this so that the LSP reoptimization process is based solely on the IGP metric, turn on the following configuration option;

set protocols mpls optimize-aggressive

While at a first glance the use of the “aggressive” feature may seem like a good thing (you would think you would always want to be taking the best path yeah?), this may not always be true. If you are also using RSVP auto bandwidth to ensure you are shuffling traffic around as links fill, you probably don’t want to use this as well. Consider the scenario where you have a link doing 5gig at the trough of utilisation and 10gig at peak – if you’re using RSVP to shuffle traffic off at the 80% full mark and have another perfectly good link to leave it on, aggressive would cause the traffic to move back when the utilisation of the most preferred link returns to a more acceptable threshold. This would  result in a potentially quite significant change in latency twice ever 24 hours (which is probably not very desirable depending on what the difference between the links are). You’d be better in this scenario to use non-aggressive optimization.

Finally, in order to manually trigger these, you can trigger either a standard or an aggressive optimization with the commands;

clear protocols mpls lsp optimize name A-to-B

clear protocols mpls lsp optimize-aggressive name A-to-B

While I assume that most of you out there running MPLS networks using RSVP-TE will have been aware of this for a long time, I hope this is of help to those of you either studying for your certifications or deploying/using this for the first time, and if you weren’t aware of this should make a few things a bit clearer regarding my previous posts on some of the more advanced functionality of RSVP-TE :).

Three new certifications – JNCIP-ENT, JNCSP-ENT & JNCSP-SP…

Over the last few weeks I’ve attained three new certifications. What’s odd is the fact that this time I’m fairly certain my primary motivator was boredom rather than anything else. Since doing my JNCIE-SP lab on 26 September I’ve been at a bit of a loose end, so I decided to set myself a challenge of getting my JNCSP-ENT and JNCSP-SP within 2 weeks of completing the JNCIE-SP exam. Then following this I needed to work towards the JNCIP-ENT as I want to do the JNCIE-ENT early next year and this is a necessary step along the way.

If you’re not already familiar with it – all of the JNCIP level exams are single exams that are roughly equivalent to a CCIE written in terms of difficulty. The JNCIP used to be done as a lab exam, however Juniper have recently changed this to a written form. While some may think that this makes the exam easier, from all I have talked to (and doing the JNCIP-SP and JNCIP-ENT now) you do have to know the content well to pass – the certification team have done a good job in ensuring that it’s of an appropriate level of difficulty to ensure that those passing it do know their stuff.

The JNCSP exams are aimed to test your abilities in skills required effectively troubleshoot a juniper network. The sorts of things that are covered are abilities to read and interpret show commands, output, traceoptions, log files, and effectively deduce from incomplete pieces of information what is required. Juniper have 3 tracks for both the JNCIP and JNCSP; ENT (Enterprise Routing & Switching), SP (Service Provider), and SEC (Security). To get each of the tracks for an IP, you do just the JNCIP-xxx exam, however to do the JNCSP, you must do the JNCSP-xxx exam plus the JTNOC exam – which tests general Juniper hardware & software troubleshooting skills.

I started by doing the JNCSP-SP exam while on holiday (whoops!) a few days after my JNCIE. I read the content once, which took me about 2 hours, went to the exam centre, then had no trouble passing with a reasonable mark. I then booked my JNCSP-ENT for the tuesday after that (did the JNCSP-SP on a thursday), and the JTNOC for the friday of the same week I was doing the JNCSP-ENT. I spent 3 hours studying for the JNCSP-ENT (mostly doing a bit of reading on 802.1X which I had not yet had to work with or study) on a flight that weekend  from America to New Zealand, then a small bit just before the exam. Again, passed, but it showed that I was weak in layer2 features and security (mainly as I had not spent enough time to fully understand all the features of 802.1X, mac auth, captive portal & VC.

Finally was the JTNOC. I did only 2 hours study – this one I flew through. Truly for this you just need to be reasonably experienced at troubleshooting on Juniper devices.

Please note that while I got away with a small amount of study for the above exams, I had just come off the back of 900 hours of study for my JNCIE-SP, so had covered all of the SP material as much as I felt was humanly possible. Additionally, in my job I spend significant amounts of time troubleshooting the most complex of issues (as the final technical escalation point within the company) . So as much as I had an easy time,  remember that there is actually a large amount of complex technologies covered in this exam; it’s important to understand these well to pass!

For the JNCIP-ENT, I originally booked this for a couple of months after I sat my JNCIE-SP, however I then found out about the JNCIE-ENT beta exam which is coming out soon, and figured that I would be in a better position to apply for this having done the JNCIP-ENT. So I rebooked it to much closer (3 days away when I rebooked it), and spent that weekend studying. Again, coming off having done a significant amount of study for my JNCIE-SP I didn’t bother doing any study for the rouging portions of this exam, but spent about 15 hours over those 3 days reading up on some of the switching features I was less familiar with such as 802.1x, mac auth, captive portal, EX VC, DAI, DHCP snooping etc. I actually found this surprisingly far more interesting than I thought I would.

Finally I sat the exam and came back with a positive result. I’m now looking forward to doing the JNCIE-ENT sometime early next year (hopefully through the beta exam programme), and have already begun doing some playing around in the lab with some of the technologies covered in the syllabus this week! If anyone has any thoughts about things to focus on or good resources for studying towards the JNCIE-ENT please do drop me a message :).