enabling data science

Friday, November 1, 2013

Nexus 5500 routing anomaly (OSPF stuck in EXSTART)

If you are using Nexus 5500 as a router, watch out for a routing anomaly. Otherwise, you may spend a lot of time debugging, troubleshooting becomes even more challenging when the other router is another vendor’s product.

In a typical scenario when a router is attached to a pair of Nexus 5500, and OSPF adjacency is to be established, the best practice is to run layer 3 Non VPC VLANs on a separate link from VPC trunk, and enable peer-gateway. However, the standard practice will result in OSPF adjacency established for directly connected Nexus 5k only, while the remote Nexu 5k remain in EXSTART.

Turns out, this is due to a little known Cisco bug. As a result, Cisco clearly states “separate link for nonvpc vlans on n5k is not supported”.

Currently the fix release is still pending. So a workaround is required, which is to run both VPC and non-VPC VLANs on VPC peer link.

Before change:

5548-sw1# sh ip ospf nei

OSPF Process ID 1 VRF default

Total number of neighbors: 2

Neighbor ID Pri State Up Time Address Interface

10.147.187.20 128 EXSTART/DR 00:10:14 10.147.187.20 Vlan110

10.147.254.161 129 FULL/BDR 00:10:12 10.147.187.19 Vlan110

Making change (both Nexus 5k):

5548-sw1# conf t

Enter configuration commands, one per line. End with CNTL/Z.

5548-sw1(config)# int po10

5548-sw1(config-if)# switchport trunk allowed vlan 110

Must reset OSPF on interface for change to take effect:

5548-sw1(config)# int vlan 110

5548-sw1(config-if)# ip ospf shut

5548-sw1(config-if)# no ip ospf shut

5548-sw1# sh ip ospf nei

OSPF Process ID 1 VRF default

Total number of neighbors: 2

Neighbor ID Pri State Up Time Address Interface

10.147.187.20 128 FULL/DR 00:00:02 10.147.187.20 Vlan110

10.147.254.161 129 FULL/BDR 00:00:02 10.147.187.19 Vlan110

Sunday, March 31, 2013

Redefining Networking

The networking industry as we know and love is shaken. New concepts emerge almost daily, buzz words abound, various “visions” in their nascent forms mixing reality with fantasy. Beneath the surface of chaos, fundamental changes are taking shape.

Like many network professionals, I feel the need to navigate through the frontal tide of confusion, and grasp the essence of change. Initially, as I swallowed a lot of information, I was easily confused and swayed one way or another. Let’s face it, most materials out there are vendor affiliated, which is inherently partial and biased. But over time, a clearer picture has emerged. However pure and simple, it has given me consistency and continuation in the thought process. I hope it will help you establish your own framework as well, and chart your own course forward.

Virtual Networking – the beginning of change

Let’s start with why, why the change, why now. To me, change is not about doing what networking already does, in a different way. Fundamentally, networking enables communication and supports compute, which enables applications. Compute has gone through its own revolution which is virtualization. Compute virtualization brought networking into the hypervisor environment, thus creating an overlap between two previously separate domains. This rudimentary form of virtual networking can be seen in the form of current generation virtual switches.

Another look at Flow Control in the cloud

I posted about Cisco Flow Control with NetApp NAS during first round of cloud implementation almost two years ago. Paul commented about updates in NetApp documentation, so now is a good time for an update, with a fresh look at the general use of flow control.

Let’s start with NetApp’s “Ethernet Storage Best Practices”, which recommends:

Not enable flow control throughout the network
Set storage to “send” flow control, set switch to “receive”

We can all agree on the first point. 802.3x Ethernet Flow Control has not been widely adopted in practice, due to implementation complexity and hardware dependency. Higher layer mechanism such as TCP Windowing, is more predictable and effective for end-to-end flow control.

Network based TCP MSS adjustment

Maximum Segment Size (MSS) is set by end points during initial TCP handshake. In special circumstances, router can step in to alter MSS.

Let’s look at such a scenario when two hosts communicate through an SSL tunnel. End points sees a path MTU of 1500 byte, and set MSS to be 1500. However, SSL adds extra overhead. Therefore, when a 1500 byte packet arrives at tunnel end points, it becomes a little larger. Furthermore, SSL often sets DF (Do not Fragment). Since the packet is now larger than 1500 byte, with DF set, the router drops it. This results in communication failure between hosts (while ping and traceroute appears to be working). An extended ping with varying packet size will verify this exact behavior.

traceroute through MPLS

traceroute is often used as an effective analysis and troubleshooting tool. It is easily interpreted in a hop by hop routing network. Tracing packets through an MPLS network, however, requires more in-depth understanding of the internetworking between routing and tag switching.

The best place to start is the MPLS PE router. On the PE router, each customer’s VPN is represented by a vrf, in this case vrf “bigco”. Examining routing table for customer’s remote destination network (172.18.0.0), notice its “next hop” is the remote PE (BGP RR address). This may be counter-intuitive that a customer VPN has a next hop in the global routing table (effectively leaping from one vrf to another), but this is precisely where MPLS does its magic.

A_PE1#sho ip route vrf bigco 172.18.0.0

Routing entry for 172.18.0.0/16

…

Last update from 10.8.0.1 5d18h ago

Routing Descriptor Blocks:

* 10.8.0.1 (Default-IP-Routing-Table), from 172.18.127.141, 5d18h ago

…

Sorting out System MAC addresses with VPC and VSS – Part 2

Following Part 1 which starts with VPC on Nexus platform, here VSS on Catalyst is compared side by side.

A simple and interesting topology can be used to illustrate. In this case, Nexus and Catalyst use different multichassis technology (VPC and VSS respectively), forming back to back virtual port channel. The effective logical topology becomes greatly simplified (shown on the right side), with benefits including utilization of full bisectional bandwidth, stable all forwarding STP, high resiliency, and ease of adding/removing physical members etc.

VSS Domain ID is very much similar to VPC Domain ID. It is a unique identifier in the topology, which represents logical virtual switch formed by two physical chassis. Only one VSS pair is associated with a particular domain.

Sorting out System MAC addresses with VPC and VSS – Part 1

A number of multichassis aggregation technologies are deployed in the data center today, for example, Cisco’s Multichassis EtherChannel (MEC) on catalyst 6500 VSS, and Virtual Port Channel (vPC) on Nexus platforms. Inter-chassis aggregation greatly increases link utilization, while simplifying design by eliminating topology dependence on spanning tree protocol. STP becomes passive as most links are forwarding, and most failure scenarios no longer require STP re-convergence, thus minimizing disruptions. Furthermore, a more elegant data center design can be achieved, with lower operational complexity, and higher return on investment.

System MAC address exists on individual devices, often used for device level negotiation, for example, bridge ID field in STP BPDU, or as part of LACP LAGID.

When multiple chassis operate in unison, software simulates the behavior of a common logical system, with the use of common virtual identifiers. Differentiating and sorting out the use of virtual system identifier and various MAC addresses is helpful for understanding, designing and deploying such systems.

It can be illustrated with a simple topology such as the one shown in the diagram, in which a pair of Nexus (in VPC domain 100) is connected to another pair (in VPC domain 101) on back to back VPCs.

BGP RIB-failure and effect on route advertisement

When examining routes advertised to BGP neighbor, notice some routes are tagged with “r”:

rtr1#sh ip bgp neighbor 10.11.19.21 advertised

BGP table version is 1735468, local router ID is 10.115.254.254

Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,

r RIB-failure, S Stale, m multipath, b backup-path, x best-external, f RT-Filter, a additional-path

Origin codes: i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path

r>i10.115.254.0/30 10.115.254.9 0 100 0 i

*> 10.115.254.0/23 10.115.254.9 21 32768 i

r>i10.115.254.4/30 10.115.254.9 0 100 0 i

r>i10.115.254.8/30 10.115.254.9 0 100 0 i

Note this is BGP “RIB-failure”, which indicates BGP fails to install the route in routing table. According to this link, the likely cause is the route is already installed by IGP which has a lower AD.

enabling data science

Friday, November 1, 2013

Nexus 5500 routing anomaly (OSPF stuck in EXSTART)

Sunday, March 31, 2013

Redefining Networking

Sunday, November 25, 2012

Another look at Flow Control in the cloud

Friday, October 19, 2012

Network based TCP MSS adjustment

Friday, September 14, 2012

traceroute through MPLS

Monday, September 3, 2012

Sorting out System MAC addresses with VPC and VSS – Part 2

Friday, August 31, 2012

Sorting out System MAC addresses with VPC and VSS – Part 1

Friday, May 18, 2012

BGP RIB-failure and effect on route advertisement