Sunday, July 25, 2010

Nexus1000v – when to use “system mtu”

I can’t be the first one confused about jumbo, mtu, and system mtu on Nexus 1000v. After reading some excellent posts, all signs were indicating that “system mtu” was designed to solve the “chicken and egg” problem of running VSM on IP storage.

Like "system vlan", “system mtu” applies to system uplink profile only. So if VSM is not even running under VEM (it runs on vSwitch), there is no need to set “system mtu”, right?

Well, not quite. It turns out system mtu is still needed to preserve the connection to VEM. Assuming jumbo is used (for storage as an example), reboot of ESX will revert the physical NIC to default MTU (1500), which results in mismatched MTU between physical NIC and virtual NIC, and loss of connectivity. “system mtu” preserves the setting on physical NIC, and thus preventing VEM from disappearing.

To further clarify, here is an example of configuring jumbo of 9000 on Nexus 1000v
1. “system jumbomtu 9000” (global)
2. “system mtu 9000” (uplink port profile)

That is all. Note once set, “system mtu” overwrites “mtu”, therefore there is no need to set interface mtu explicitly.

A couple of things potentially confusing:
-The show commands on Nexus 1000v is not exactly accurate for MTU, fix is supposed to be coming
-There is an error in Cisco Nexus command reference, which states “The value that is configured for system mtu command must be less then value configured in the system jumbomtu command”. It should be “less or equal to”; There are no reason to set system mtu to 8998 unless hardware dictates so.

I hope that clear up some confusions. If you notice any behavior inconsistent with this understanding, please kindly let me know.

Wednesday, July 21, 2010

Nexus 1000v bug - Three ways to check VSM HA health

If you are using Nexus 1000v, more than likely you have set up VSM high availability. To ensure system stability and gain the true benefit of HA, check if VSM HA is truly synchronized. Don’t stop when you see 2 supervisor modules. Otherwise you may be caught at the worst time of a failure scenario, and found out there is no HA, and you have to deal with a bug, even configuration loss.

Specifically, check after initial setup, and after system operations such as VSM reload. There is a bug, CSCtg46327, fixed in 4.0(4)SV1(3a). It prevents VSM active and standby to synchronize. Standby continues to come up and down in attempts to do so.

You can check for this bug and other potential VSM HA problem using these methods:

1. “show module” output affected by the bug, module 2 should show “VSM, Nexus 1000V, ha-standby”, instead of “powered-up”:

vsm# sh module
Mod Ports Module-Type Model Status
--- ----- -------------------------------- ------------------ ------------
1 0 Virtual Supervisor Module Nexus1000V active *
2 0 Supervisor/Fabric-1 powered-up
5 248 Virtual Ethernet Module NA ok
6 248 Virtual Ethernet Module NA ok
7 248 Virtual Ethernet Module NA ok
...

2. “show svs neighbor” list standby VSM MAC as a “VEM”, which is incorrect. It should be type VSM:
vsm# sh svs nei
Active Domain ID: 91
AIPC Interface MAC: 0050-56b4-52bb
Inband Interface MAC: 0050-56b4-3fc1


Src MAC Type Domain-id Node-id Last learnt (Sec. ago)
------------------------------------------------------------------------
0050-56b4-5eb5 VEM 0 ffffffff 0.00
0002-3d43-8504 VEM 901 0502 160591.40
0002-3d43-8505 VEM 901 0602 160591.30
0002-3d43-8506 VEM 901 0702 160230.70

3. “show system redundancy status” shows operational mode “none”, which should be HA:

vsm# sh sys red stat
Redundancy role
---------------
administrative: primary
operational: primary


Redundancy mode
---------------
administrative: HA
operational: None


This supervisor (sup-1)
-----------------------
Redundancy state: Active
Supervisor state: Active
Internal state: Active with warm standby


Other supervisor (sup-2)
------------------------
Redundancy state: Standby
Supervisor state: HA standby
Internal state: HA standby

Tuesday, July 20, 2010

Nexus1000v bug – widespread VM intermittent connectivity and multicast failure

If you are experiencing widespread, contantly changing, totally unpredicatable VM connectivity issue, you have likely hit this bug. You will notice ping works intermittently, and all your regular troubleshooting will yield no particular results, because the behavior is not consistent.

I have benefited tremendously from many posts out there, I hope this will save some of you many many hours of frustration.

First, verify your health (I meant that of your nexus1000v, although at that point your health is probably not very good either). Issue command on VSM ’module vem # execute vemcmd show dr’. The DR (Designated Receiver) must be associated with uplinks. The DR must not be associated with any of the vmnics.

The following shows the “good” output. You have hit the bug if DR is pointing to any of the vmnics for any VLAN you have.

BD 2899, vdc 1, vlan 2899, 3 ports, DR 304, multi_uplinks TRUE
Portlist:
20 vmnic4
22 vmnic6
304 DR

The root cause is a Nexus 1000v programming error that misplaces DR, which is used for multicast and broadcast traffic. As problem comes and goes, lost ARP causes widespread inconsistent behaviors throughout the network. You may notice ping works from Nexus 7000, but not from Nexus 5000, works from one device, but not from another, you get the idea.

The temporary recovery procedure is to reset the physical ports associated with the VEM by “shut” and “no shut”. Check with ’module vem # execute vemcmd show dr’ again to see the symptoms corrected.

The bug is known as “unpublished”, with information hidden. The fix is forecast to be in the upcoming patch release, which should be soon.

One last thing, you may not want to rush with “shut” and “no shut”. Give it at least 10 seconds in between, to avoid another bug. More on this later.

Tuesday, July 13, 2010

Nexus 1000v bug - avoid LACP problems by using mode active

If you are using Nexus 5000 VPC as the method for host connection to Nexus 1000v, you may want to watch out for a current situation. Last week at Networkers, best practice was described as using mode active on Nexus 5000, and mode passive on Nexus 1000v, thus allowing Nexus 5000 to establish the LACP port channels.

However, there is a reported error condition, when triggered by something like a VSM reload, which will effectively put certain ports into "suspended" mode, as shown below:

vsm-1(config)# sh port-channel summary
...
5 Po5(SU) Eth LACP Eth5/5(s) Eth5/7(P)

Another symptom is "show CDP" does not match between 1000v and 5000, with only one side sending hellos.

Using a Cisco internal command, shows more detailed event information:
vsm-1(config)# sh port-c internal event-history errors
1) Event:E_DEBUG, length:162, at 710664 usecs after Fri Jul 9 20:38:39 2010
[102] pcm_proc_response(373): Ethernet5/5 (0x1a040400): Setting response status to 0x402b000c (port not compatible) for MTS_OPC_ETHPM_PORT_BRINGUP (61442) response

2) Event:E_DEBUG, length:84, at 710191 usecs after Fri Jul 9 20:38:39 2010
[102] pcm_eth_port_ac_drop_all_txns(798): Interface Ethernet5/5 suspended by protocol


It will probably be fixed in a future N1000v release, current workaround? Set LACP mode to "active" on Nexus 1000v:

port-profile type ethernet systemuplink_profile
vmware port-group

switchport mode trunk
switchport trunk allowed vlan all
channel-group auto mode active
no shutdown
system vlan ...
state enabled