Showing posts with label Cisco. Show all posts
Showing posts with label Cisco. Show all posts

Saturday, January 29, 2011

Jumbo MTU on Nexus 1000v – Why and How

I posted earlier about the use of “system mtu” on Nexus 1000v.  The internetworking of virtual switch in a physical topology can still be confusing, and the existence of likely display bugs probably didn’t help with the situation.

So here is my understanding of why and how, hopefully in more simple terms than any documentation I have seen.

First, why enable jumbo MTU on Nexus 1000v? Jumbo support is desirable for applications such as storage, for example your ESX host in a NAS environment. Enabling jumbo on Nexus 1000v is part of the “end-to-end” jumbo configuration, starting from the host. In a typical topology, you will need to configure Nexus 7000, Nexus 5000, and Nexus 1000v since it is the virtual switch that partly resides on the host.  You can refer to more configuration details here.

A diagram is helpful to illustrate the relationship between Nexus 5000 and Nexus 1000v, particularly the confusing port channel relationship.  Note on 5000 VPCs are typically configured (Po101-103). On Nexus 1000v, only system uplink profile is predefined. Port channel are created only when hosts/VEMs are added. Adding host on vCenter triggers the creation of matching port channels (Po5-7) in VSM.


Next, how to enable jumbo MTU on Nexus 1000v?

1.       Set system jumbomtu (set by default)
            You really don’t need to do anything here, because by default it has already been set to maximum
\N1kv# sh run all | inc jumbo
system jumbomtu 9000

2.       Set system mtu
This is really the only configuration required on Nexus1000v. By setting system mtu, we can preserve the physical NIC setting through ESX reboot
port-profile type ethernet systemuplink_portchannel
  system mtu 9000


Lastly, how to verify that jumbo is working?
This is the most confusing part. Depending on your port type (LCAP etc), you may get different display results. In my experience, both “show int port-channel” and “show interface Ethernet” displays MTU 1500 (which is incorrect). This display error is similar to one on Nexus 5000 (CSCsl21529). But on Nexus 1000v there is no jumbo packet count, thus making verification even more difficult.
Nexus-1000v-VSM# sh int po1
port-channel1 is up
  Hardware: Port-Channel, address: 0050.5655.cbe0 (bia 0050.5655.cbe0)
  MTU 1500 bytes, BW 20000000 Kbit, DLY 10 usec,
     reliability 255/255, txload 1/255, rxload 1/255
Nexus-1000v-VSM # sh int e3/5
Ethernet3/5 is up
  Hardware: Ethernet, address: 0050.5655.dbc8 (bia 0050.5655.dbc8)
  MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec,
     reliability 0/255, txload 0/255, rxload 0/255

To alternative to verify that jumbo is really working on Nexus 1000v, is simply to check upstream switch. For example, on Nexus 5000, check either the corresponding port channel or physical interface. Since Nexus 5000 is receiving (RX) jumbo packets from the host, it is a clear indication of the function of Nexus 1000v uplink profile.

Nexus-5010-a# sh int po101
port-channel101 is up
 vPC Status: Up, vPC number: 101
  Hardware: Port-Channel, address: 0005.9b76.308f (bia 0005.9b76.308f)
  MTU 1500 bytes, BW 10000000 Kbit, DLY 10 usec,
     reliability 255/255, txload 1/255, rxload 1/255
  Encapsulation ARPA
  Port mode is trunk
  full-duplex, 10 Gb/s
  Beacon is turned off
  Input flow-control is off, output flow-control is off
  Switchport monitor is off
  EtherType is 0x8100
  Members in this channel: Eth1/8
  Last clearing of "show interface" counters never
  30 seconds input rate 4344 bits/sec, 543 bytes/sec, 1 packets/sec
  30 seconds output rate 44352 bits/sec, 5544 bytes/sec, 48 packets/sec
  Load-Interval #2: 5 minute (300 seconds)
    input rate 4.50 Kbps, 1 pps; output rate 47.82 Kbps, 47 pps
  RX
    11483591 unicast packets  205769 multicast packets  30036 broadcast packets
    11719396 input packets  2923241828 bytes
    274665 jumbo packets  0 storm suppression packets
    0 runts  0 giants  0 CRC  0 no buffer
    0 input error  0 short frame  0 overrun   0 underrun  0 ignored
    0 watchdog  0 bad etype drop  0 bad proto drop  0 if down drop
    0 input with dribble  0 input discard
    0 Rx pause
  TX
    134696185 unicast packets  163700959 multicast packets  12680857 broadcast p
ackets
    311078001 output packets  240793383992 bytes
    47357469 jumbo packets
    0 output errors  0 collision  0 deferred  0 late collision
    0 lost carrier  0 no carrier  0 babble
    0 Tx pause
  2 interface resets


Nexus 1000v is only a component in the larger picture. Scott Lowe provides a great example here to show how jumbo support is enabled on ESX server and NAS. In particular, testing jumbo from ESX using vmkping command:

vmkping -s 9000

Saturday, October 23, 2010

BGP timer and Cisco/Juniper negotiation

Juniper SRX integrates firewall features with full routing capabilities. In a network architecture where there are increasing demand for security segmentation and stateful firewall inspection within the network, multi-vender routing interaction may become necessary.

Risks and complexity exist with a mixed vendor environment, even when both sides support standard protocols. Here is such an example.

To establish BGP neighbor relationship, both sides need to agree on timer values. RFC4271 defines the characteristics of the timer, but does not specify value for keepalive and hold time.

BGP uses keepalive timers to monitor established connections. If for a period of time that exceeds hold time, BGP considers the neighbor connection down. Note the hold time is always three times the keepalive time.

By default, vendors set timers differently:
vendor Keepalive (second) Hold time (second)
Cisco Nexus (4.2) 60 180
Juniper SRX (10.0) 30 90

The timers should match between vendors. This is based on configuration, or negotiated behavior. Note with Juniper SRX, BGP on the local routing device uses the smaller of either the local hold-time value or the peer’s hold-time value received in the open message as the hold time for the BGP connection between the two peers. Therefore, by setting timer on Cisco Nexus to a smaller value (10/30), it is used by SRX.

To reduce convergence, the test sets Nexus keepalive to 10 seconds. This shows after successful negotiation, SRX (with a local hold time of 90) uses Nexus’s keepalive of 10 seconds, and active hold time of 30 seconds.

On Juniper SRX:
Peer: 10.88.15.14+179 AS 65000 Local: 10.88.14.4+54104 AS 65000
Type: Internal State: Established Flags:
Last State: OpenConfirm Last Event: RecvKeepAlive
Last Error: None
Export: [ bgp_outbound_policy ]
Options:
Local Address: 10.88.14.4 Holdtime: 90 Preference: 170
Number of flaps: 0
Peer ID: 10.88.120.130 Local ID: 10.88.14.4 Active Holdtime: 30
Keepalive Interval: 10 Peer index: 0


On Cisco Nexus:
Nexus7k# sh ip bgp nei vrf test
BGP neighbor is 10.88.14.4, remote AS 65000, ibgp link, Peer index 1
BGP version 4, remote router ID 10.88.14.4
BGP state = Established, up for 00:53:15
Last read 00:00:05, hold time = 30, keepalive interval is 10 seconds
Last written 0.956168, keepalive timer expiry due 00:00:09
Received 20433 messages, 1 notifications, 0 bytes in queue
Sent 18392 messages, 0 notifications, 0 bytes in queue
Connections established 2, dropped 1
Last reset by peer 02:05:25, due to peer deconfigured
Last reset by us never, due to process restart

Neighbor capabilities:
Dynamic capability: advertised (mp, refresh, gr)
Dynamic capability (old): advertised
Route refresh capability (new): advertised received
Route refresh capability (old): advertised received
4-Byte AS capability: advertised received
Address family IPv4 Unicast: advertised received
Graceful Restart capability: advertised received