Windows Unicast NLB Performance Tuning

Windows Server 2000 and later offer a clustering option known as Windows Network Load Balancing (or NLB for short). This technology allows for a very cost-effective clustering solution. When working with the lowest common denominator of a switched networking architecture, NLB is limited to Unicast operations only. As this technology has been out for nearly I decade, I pause to bring to being it up but in productions environments I keep running across performance impacts from using this technology. Per Microsoft:

In its default unicast mode of operation, Network Load Balancing reassigns the station address (“MAC” address) of the network adapter for which it is enabled (called the cluster adapter), and all cluster hosts are assigned the same MAC address. Incoming packets are thereby received by all cluster hosts and passed up to the Network Load Balancing driver for filtering.

Network Load Balancing’s unicast mode induces switch flooding in order to simultaneously deliver incoming network traffic to all cluster hosts.

I see of hear of small-to-medium sized organizations introducing multiple vlans on their networks for performance to control broadcast storms, which is a great starting step. Where it stops often is at the data center where there is a single vlan for all server traffic. Microsoft mentions that there is a “port-flooding” condition that may occur, but at what level? For example, lets introduce a pair of IIS NLB clusters into a single vlan with gigabit connectivity along with say less than 100 other servers with traffic in the neighborhood of 200 simultaneous connections or less. Everything still works, performance may seem a little sluggish but nothing to noticeable.

Now lets scale the traffic up by either upping the connections to 1,000+ or having a backup solution point to a DNS entry or one of the IP’s on the network card which is part of the NLB cluster. You will start to see switch ports lit up like a Christmas tree and periodic dropped packets on the given vlan.

With NLB in Unicast mode every packet received is sent as a broadcast packet to every member of the vlan. Separately this can introduce security issues for the environment.

Graphically, what does this look like? Below you will see an RRD graph with the amount of traffic being sent to a monitoring port on the network, the baseline is from 30 to 35 kbps. In this scenario there is on NLB cluster offering up IIS under Server 2003 and a second NLB clustering offering up Microsoft Exchange 2007 CAS/Frontend services. Each cluster introduces approximately 15 kbps of traffic to every node on the vlan. You will also notice that by design the Unicast NLB method introduces this problem on the receive-side only, packet transmission from the cluster does not flood the vlan.

NLB Effect

Correction of this design issue is fairly straight-forward. Each Windows NLB cluster should by design be in vlan isolation to prevent port-flooding. If vlan isolation is not an option for weeks/months for whatever the reason you might be able to reduce the scope of the flooding by adjusting the “Port Rules” option as shown below. For vlan sizing I would take into account whatever your current plans or end-game ideas for the cluster (whichever is larger), then double it, add the number of routing virtual IP’s from the networking side and add one for troubleshooting. For smaller clusters a /28 would be sufficient to meet these requirements which allows for future expansion, cluster node upgrading/replacement, and a spare IP for troubleshooting in case a problem should arise.

Microsoft reference: http://technet.microsoft.com/en-us/library/bb742455.aspx

Leave a comment

Your email address will not be published. Required fields are marked *