Thursday, June 10, 2010

Understand Bridge Assurance

Bridge assurance comes up often in Nexus troubleshooting. Therefore it is important to understand its design and effect. What is BA? It is a Cisco STP enhancement feature designed to prevent loops, by making sure that a neighboring switch does not malfunction and begin forwarding frames when it shouldn't. Configured incorrectly, BA will likely cause some headaches.

BA monitors receipt of BPDUs on point-to-point links. When the BPDUs stop being received, the port is put into blocking state (actually a port inconsistent state, which stops forwarding). This is typically seen with "show spanning-tree ..."

Now it will make sense to highlight the important characteristics of BA:
• It’s enabled globally by default, but disabled by default on interfaces
• It is enabled only on STP “network” interface
• For it to work, both ends of the link must support BA; Otherwise, the BA side will block
• BA only works on point to point Cisco connections

Here are two examples of BA troubleshooting:

1. If STP "network" type is used with a host VPC, the host side does not support bridge assurance, and we know Nexus 1000v does not even send BPDU, then turning on BA on Nexus 5000 will make the port go into “inconsistency” and blocking.

2. With Nexus 7000 and 5000 back to back VPC connections, it is important to set both sides to type “network”, thus enabling BA consistently. Otherwise, it will also go into blocking state due to inconsistency.

See a more complete picture of spanning tree design is a typical Nexus virtualized data center here.

Wednesday, June 9, 2010

OSPF EIGRP BGP dual point mutual redistribution - Part 2

In part 1 of the post, we solved feedback on OSPF side, but issue in the opposite direction still exists. OSPF routes redistributed into EIGRP on R1, R2 learns from R1/EIGRP and prefers it (since we have lowered EIGRP EXT AD to fix issue 1). Because R2 now prefers the feedback route, it will not redistribute these OSPF routes to EIGRP.

First, between R1 and R2, stop EIGRP advertising routes redistributed from OSPF.
router eigrp 1
  distance eigrp 90 100
  distribute-list route-map Deny_routes_from_OSPF in

route-map Deny_routes_from_OSPF deny 10
  match tag [ospf tags]

As best practice, also use tag to stop the feedback to redistribution points, by blocking OSPF originated routes from redistributing back into OSPF.
route-map redist_EIGRP-to-OSPF deny 10
 match tag [ospf tags]
route-map redist_EIGRP-to-OSPF permit 20
...



What about routes learned from MPLS? Those are typically remote offices, which we don’t want to be redistributed from OSPF.

See an MPLS remote network 10.103.2.0/24. EIGRP side gets it from BGP as EIGRP EXT. OSPF side also gets it from BGP. Could redistributing router prefer EIGRP (since we set lower AD)? The router has two paths (left and right) to go out to the WAN to reach the remote site. We would want it to use the “local” MPLS exit point, in this case is through OSPF.

This is where EIGRP metric becomes important. Note EIGRP must have at least default metric for redistribution to happen. It is learning the same route from both BGP and OSPF. At the point of redistribution, the metric assigned to routes redistributed from OSPF to EIGRP should be lower than EIGRP metric going the other way (typically by setting lower bandwidth on WAN links). As a result, the redistribution router prefers the locally redistributed routes from OSPF.

Redistribution-rtr-1#sh ip eigrp top 10.103.2.0 255.255.255.0
EIGRP-IPv4 (AS 100): Topology default(0) entry for 10.103.2.0/24
State is Passive, Query origin flag is 1, 1 Successor(s), FD is 4352
Routing Descriptor Blocks:
168.147.152.161, from Redistributed, Send flag is 0x0
Composite metric is (4352/0), Route is External
Vector metric:
Minimum bandwidth is 625000 Kbit
Total delay is 10 microseconds
Reliability is 255/255
Load is 1/255
Minimum MTU is 1500
Hop count is 0
External data:
Originating router is 10.250.248.7 (this system)
AS number of route is 1
External protocol is OSPF, external metric is 7
Administrator tag is 200 (0x000000C8)
But MPLS does cause additional feedback issue, which we will cover in part 3.

Tuesday, June 8, 2010

enabling jumbo MTU on Nexus

In our data center, enabling jumbo MTU has proven to result in significant throughput improvements. It should have been straightforward, but unfortunately not with Nexus. This note is meant to fill in a gap where existing documentation may be lacking.

In a typical Nexus 7000/5000/1000v architecture, use these steps to enable jumbo MTU end to end, and then verify.

1. System jumbomtu
It defines the maximum MTU size for the switch, which must be configured on ALL devices.
Note the max value depend on hardware, for example it’s 9000 on Nexus 1000v, and 9216 elsewhere.

2. On Nexus 5000, must configure with system QoS policy
Note it can not be configured on the interface! This is quite different from any other device I am aware of. To make it even more confusing, the command varies between releases.

For example, with 4.0.1:
policy-map jumbo
 class class-default

  mtu 9216
system qos 
 service-policy jumbo

With 4.1, 4.2 it becomes:
policy-map type network-qos jumbo
 class type network-qos class-default
  mtu 9216
system qos
 service-policy type network-qos jumbo

3. On Nexus 1000v, configure on port channel
You will see some documentation showing configuration on physical interface, and disappointed when you actually try to do it. No, it won’t take the command. You must configure on the port channel! And then, magic, it will appear on the physical interface.

More detailed explanation on Nexus1000v is here.

4. On Nexus 7000, configure on the interface
Don’t forget “system jumbomtu” on ALL devices

Verification
So how to verify? Assuming you have jumbo traffic, then you will see the difference before and after with “show interface … counter detail”

Nexus5000# sh int e1/1 count detail
Ethernet1/1
Rx Packets: 95396987709
Rx Unicast Packets: 95396962128
Rx Multicast Packets: 21481
Rx Broadcast Packets: 4100
Rx Jumbo Packets: 89336029550
Rx Bytes: 136394308168545
Rx Packets from 0 to 64 bytes: 6028626455
Rx Packets from 65 to 127 bytes: 611554
Rx Packets from 128 to 255 bytes: 221439
Rx Packets from 256 to 511 bytes: 188679
Rx Packets from 512 to 1023 bytes: 579970
Rx Packets from 1024 to 1518 bytes: 30730062
Rx Trunk Packets: 95396966228
Tx Packets: 2613558
Tx Unicast Packets: 257688
Tx Multicast Packets: 2355702
Tx Broadcast Packets: 168
Tx Bytes: 202281759
Tx Packets from 0 to 64 bytes: 128120
Tx Packets from 65 to 127 bytes: 2472242
Tx Packets from 128 to 255 bytes: 8732
Tx Packets from 256 to 511 bytes: 4391
Tx Packets from 512 to 1023 bytes: 63
Tx Packets from 1024 to 1518 bytes: 10
Tx Trunk Packets: 2307778

Known Issues

Last but not the least, be aware of some known issues (depending on version):

CSCsl21529, Symptom: An incorrect MTU value is displayed in the show interface command output. The Cisco Nexus 5000 Series switch only supports class-based MTU. Per-interface level MTU configuration is not supported. The switch supports jumbo frames by default. However, the show interface command output currently displays an incorrect MTU value of 1500 bytes.
Also, VEM may not retain jumbo MTU setting after reboot. Just something to watch and monitor.

Monday, June 7, 2010

OSPF EIGRP BGP dual point mutual redistribution - Part 1

It is still a common design to use IGP in the enterprise core. For a large enterprise, the requirement to integrate OSPF with EIGRP may be the result of mergers and acquisitions.

Although it may look like a CCIE bootcamp lab, mutual redistribution can be even more challenging in a production environment, due to these factors:
-As it will be shown, dual router mutual redistribution can be very tricky
-"Backdoor" paths such as multiple MPLS WAN networks adds more complexity
-Having multiple data centers, and the need to route different traffic in different failure scenarios

As I worked through the issues in a very large enterprise environment, We have seen multiple issues having a chain reaction, making the symptoms very difficult to diagnose. Sometimes, fixing one issue may introduce new ones. Unless we have an absolute crisp grasp of the design, and a systematic approach, the chance of confusion is extremely high.

This is the first of a 3 part series which I thought would be worthwhile to share some basics, and provide a logical breakdown of the interacting issues into separate and manageable pieces.


The much simplified diagram shows dual mutual redistribution points (blue arrow). So what is the issue? with dual router mutual redistribution, feedback can occur in both directions, resulting in inconsistency on the two redistribution points, and sub-optimal routing, even potential routing loops.
Issue 1: EIGRP->OSPF (feedback from OSPF)

Only applies to EIGRP EXT routes, which has a higher AD (170) than OSPF (110). These routes are redistributed into OSPF via R1. R2 learns from R1/OSPF, and prefers it. Therefore R2 will not redistribute these routes from EIGRP to OSPF, resulting in only one path used. Any EIGRP EXT route that doesn’t exist in OSPF will show up on one of the routers as preferring OSPF due to lower admin distance, breaking EIGRP->OSPF redistribution on one router. This means all traffic will be directed to one side.

This issues is resolved by setting EIGRP EXT AD to lower than OSPF (distance eigrp 90 100).

Note there is a side effect, the redistribution router will always prefer EIGRP path (due to lower EIGRP AD). But within OSPF, routes redistributed from EIGRP will have a higher metric (set with redistribution route map), therefore there is no risk of disturbing preference within OSPF.
Setting OSPF EXT AD(distance ospf ext 200) is similar. Both solutions will introduce issue 2, which we will cover in part 2.