Sunday, March 6, 2011

Cisco Flow Control with NetApp NAS

Update Nov 2012: Thanks for the comment, Paul. I have added an update with another look at flow control.

When NAS is used with virtualization, performance and throughput are potential concerns. Enabling jumbo has proven to be the most effective method. Flow control, on the other hand, is not always easily agreeable by all parties.

NetApp has a one paragraph “best practice” on the subject. The recommendation is to set flow control to “receive on” on the switch port. In other words, allow the switch to receive “pause” from NAS.

As it appears, NetApp can send a lot of “pause”. This is easily shown on the port channel or physical interfaces connected to NAS:

Nexus5k# show interface e2/7
Ethernet2/7 is up
  30 seconds input rate 116486568 bits/sec, 2079 packets/sec
  30 seconds output rate 33970464 bits/sec, 792 packets/sec
  Load-Interval #2: 5 minute (300 seconds)
    input rate 57.99 Mbps, 1.58 Kpps; output rate 32.18 Mbps, 693 pps
    20604669084 unicast packets  176831374 multicast packets  0 broadcast packet
    20781500458 input packets  63305126825983 bytes
    8255225288 jumbo packets  0 storm suppression packets
    0 runts  0 giants  0 CRC  0 no buffer
    0 input error  0 short frame  0 overrun   0 underrun  0 ignored
    0 watchdog  0 bad etype drop  0 bad proto drop  0 if down drop
    0 input with dribble  0 input discard
    176409426 Rx pause
    12091810302 unicast packets  52908470 multicast packets  2650659 broadcast p
    12147369431 output packets  48540927928599 bytes
    7298956759 jumbo packets
    0 output errors  0 collision  0 deferred  0 late collision
    0 lost carrier  0 no carrier  0 babble
    0 Tx pause
  0 interface resets

So what does it mean? Flow control is only meaningful if the receiving party acts on it. Therefore the expectation here is for switch to “slow down” the transmission, since it is hearing NAS saying “slow down, I can’t keep up”.

According to Cisco, once Pause is enabled and received on switch egress port, it will back pressure the ingress port, eventually packets will be buffered on the ingress port. If pause is enabled on the ingress port, it can further send the pause to the upstream switch. In the ideal scenario, the pause eventually reaches the source, which is the ESXi host, thus slowing down the origination of the transmission.

However, taking a second look at the NetApp diagram here, the recommendation is to configure the end points, ESX servers & NetApp arrays with flow control set to ‘send on’ and ‘receive off’. In other words, ESX is not expected to receive flow control, which leaves only the network to absorb Pause.

Now let’s revisit the flow control scenario again, NAS sends Pause, switches receives it, there is no use applying back pressure all the way to the host since host won’t receive it. The best a switch can do is to buffer it, or apply back pressure to upstream, and have the upstream switch buffer somewhere.

How well that works really depends on the switch and linecard models, each have different capabilities and buffer size. In many cases, it is highly questionable how far back flow control is propagating to have any positive effect. In any case, you want to check:
  • Switch interface to NAS, to see the amount of Pause received
  • All interfaces where NAS traffic flows, to see if there are drops
 More clarifications on this topic are probably required from vendors. Device behavior will likely evolve with technology advancements. For now, it’s best to turn flow control on switch side, but monitor network behavior closely.