Sunday, November 25, 2012

Another look at Flow Control in the cloud

I posted about Cisco Flow Control with NetApp NAS during first round of cloud implementation almost two years ago. Paul commented about updates in NetApp documentation, so now is a good time for an update, with a fresh look at the general use of flow control.

Let’s start with NetApp’s “Ethernet Storage Best Practices”, which recommends:
  1.  Not enable flow control throughout the network
  2. Set storage to “send” flow control, set switch to “receive”
We can all agree on the first point. 802.3x Ethernet Flow Control has not been widely adopted in practice, due to implementation complexity and hardware dependency. Higher layer mechanism such as TCP Windowing, is more predictable and effective for end-to-end flow control.

Friday, October 19, 2012

Network based TCP MSS adjustment

Maximum Segment Size (MSS) is set by end points during initial TCP handshake. In special circumstances, router can step in to alter MSS.

Let’s look at such a scenario when two hosts communicate through an SSL tunnel. End points sees a path MTU of 1500 byte, and set MSS to be 1500. However, SSL adds extra overhead. Therefore, when a 1500 byte packet arrives at tunnel end points, it becomes a little larger. Furthermore, SSL often sets DF (Do not Fragment). Since the packet is now larger than 1500 byte, with DF set, the router drops it. This results in communication failure between hosts (while ping and traceroute appears to be working). An extended ping with varying packet size will verify this exact behavior.

Friday, September 14, 2012

traceroute through MPLS

traceroute is often used as an effective analysis and troubleshooting tool. It is easily interpreted in a hop by hop routing network. Tracing packets through an MPLS network, however, requires more in-depth understanding of the internetworking between routing and tag switching.

The best place to start is the MPLS PE router. On the PE router, each customer’s VPN is represented by a vrf, in this case vrf “bigco”. Examining routing table for customer’s remote destination network (, notice its “next hop” is the remote PE (BGP RR address). This may be counter-intuitive that a customer VPN has a next hop in the global routing table (effectively leaping from one vrf to another), but this is precisely where MPLS does its magic.

A_PE1#sho ip route vrf bigco
Routing entry for
  Last update from 5d18h ago
  Routing Descriptor Blocks:
  * (Default-IP-Routing-Table), from, 5d18h ago

Monday, September 3, 2012

Sorting out System MAC addresses with VPC and VSS – Part 2

Following Part 1 which starts with VPC on Nexus platform, here VSS on Catalyst is compared side by side.

A simple and interesting topology can be used to illustrate. In this case, Nexus and Catalyst use different multichassis technology (VPC and VSS respectively), forming back to back virtual port channel. The effective logical topology becomes greatly simplified (shown on the right side), with benefits including utilization of full bisectional bandwidth, stable all forwarding STP, high resiliency, and ease of adding/removing physical members etc.

VSS Domain ID is very much similar to VPC Domain ID. It is a unique identifier in the topology, which represents logical virtual switch formed by two physical chassis. Only one VSS pair is associated with a particular domain.

Friday, August 31, 2012

Sorting out System MAC addresses with VPC and VSS – Part 1

A number of multichassis aggregation technologies are deployed in the data center today, for example, Cisco’s Multichassis EtherChannel (MEC) on catalyst 6500 VSS, and Virtual Port Channel (vPC) on Nexus platforms. Inter-chassis aggregation greatly increases link utilization, while simplifying design by eliminating topology dependence on spanning tree protocol. STP becomes passive as most links are forwarding, and most failure scenarios no longer require STP re-convergence, thus minimizing disruptions. Furthermore, a more elegant data center design can be achieved, with lower operational complexity, and higher return on investment.  

System MAC address exists on individual devices, often used for device level negotiation, for example, bridge ID field in STP BPDU, or as part of LACP LAGID.

When multiple chassis operate in unison, software simulates the behavior of a common logical system, with the use of common virtual identifiers. Differentiating and sorting out the use of virtual system identifier and various MAC addresses is helpful for understanding, designing and deploying such systems.

It can be illustrated with a simple topology such as the one shown in the diagram, in which a pair of Nexus (in VPC domain 100) is connected to another pair (in VPC domain 101) on back to back VPCs.

Friday, May 18, 2012

BGP RIB-failure and effect on route advertisement

When examining routes advertised to BGP neighbor, notice some routes are tagged with “r”:

rtr1#sh ip bgp neighbor advertised
BGP table version is 1735468, local router ID is
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
              r RIB-failure, S Stale, m multipath, b backup-path, x best-external, f RT-Filter, a additional-path
Origin codes: i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
r>i10.115.254.0/30            0    100      0 i
*>           21         32768 i
r>i10.115.254.4/30            0    100      0 i
r>i10.115.254.8/30            0    100      0 i

Note this is BGP “RIB-failure”, which indicates BGP fails to install the route in routing table. According to this link, the likely cause is the route is already installed by IGP which has a lower AD.