Using deep buffers to avoid traffic spikes in the data center

This vendor-written tech primer has been edited by Network World to eliminate product promotion, but readers should note it will likely favor the submitter's approach.

Server virtualization is dramatically increasing the number of applications supported in each rack which can result in traffic spikes that overwhelm traditional top-of-rack (ToR) switches, causing dropped packets and processing delays. But a new generation of ToR switches addresses the potential for spikes by incorporating much larger packet buffers.

With most virtualized servers supporting between two and 10 applications, the network traffic profile is changing. Not only is traffic increasing, but the potential for very large traffic spikes has increased as well. With each of these virtualized servers, each rack now hosts between 160 and 400 application instances.

OVERVIEW: Trends reshaping networks

Traditional data centers did not experience network spikes because physical servers were heavily underutilized. Most servers only supported a single application and were sized to support peak needs, meaning the rest of the time they were only utilized at 10 per cent to 20 per cent.

However, with virtualization and cloud computing, servers are more highly utilized and more concentrated in fewer locations, and as a result, data center network traffic is changing.

IDC's World Wide Server Forecast 2010 report shows the rate of virtual server growth far outpacing physical servers. In today's data centers, servers support multiple applications, driving up utilization rates as high as 50% to 60%. With virtualized servers supporting four to 10 applications, a rack of 20 servers that in the past would only support 20 applications now supports as many as 200.

As the number of applications generating network traffic increases, the aggregated result is a higher level of traffic with fewer peaks and valleys because the network peaks and valleys from each application tend to cancel each other out. However, when a majority of the applications generate a burst of traffic at the same time, large network spikes occur.

These spikes in traffic can result in dropped packets, and if multiple applications request retransmissions at the same time, things can get out of control. While the frequency of major spikes is lower than regular traffic spikes, they are real and when they happen they hit with no warning. It's the spiraling effect that's influenced by a multitude of factors that makes the spikes so unpredictable.

While normal spikes last for seconds or even a minute or two and result in a hiccup to systems and data center operation, in high-density application environments the cascading effect of a network traffic spike impact could last for several hours and impact business operations.

Impact of traffic spikes

For traditional network traffic, dropped packets are simply retransmitted, resulting in slower network performance. However, for storage networks, as is the case with iSCSI, dropped packets could result in a retransmit of a large block of data, increasing the stress on the network. The emerging trend toward consolidating server and storage traffic on the same network using protocols such as iSCSI and Fibre Channel over Ethernet (FCoE) makes it imperative that the network be immune to packet loss. Unfortunately, more complete standards like Data Center Bridging are focused on 10G Ethernet while the majority of data centers are still running 1G (albeit, multiple 1G links) from the servers to the network.

As highly virtualized data centers begin to show signs of traffic spike activity, IT architects will have to focus on the switching infrastructure to ensure it is capable of handling significant spikes. It would be costly to model the tipping point when "x" number of virtual application instances running on physical servers with "y" number of cores and "z" number of 1G interfaces making "bursty" network request at p per cent of the time of file block sizes "q" from "r" number of files or databases at a given point in time.

IT architects have been fortunate not to have to operate on all these variables at the same time until recently. Fortunately, the cost-effective answer is to implement switches that have more than enough buffering capacity to handle large traffic spikes.

A typical top-of-rack switch has 8-16M bytes of buffering capacity. When this capacity is overwhelmed, packet loss occurs. Conventional wisdom to avoid packet loss is to a) look at ways to "throttle" the traffic by identifying those "offending" application instances and "tame" them or move those "heavy" applications to a separate network/rack, b) throw more network bandwidth at the problem, say by adding another 1G connection or an even more expensive approach of upgrading the whole infrastructure to 10G, or c) take the Cadillac approach of implementing 10G infrastructure with converged network adapters and FCoE, etc.

However, a new crop of switches is now appearing that offer non-blocking, line-rate performance with several gigabytes of buffering capacity. Each port is assigned a given amount of buffer, but if that capacity is exceeded the port can dynamically draw upon a large reserve pool of memory if necessary. As a result, buffer capacity is self-regulating and traffic spikes simply cause the use of larger amounts of buffer.

Also, since these switches are available for 1G non-blocking, line rate performance, IT architects do not have to prematurely upgrade their infrastructure to 10G.

As IT architects seek to flatten their networks in order to reduce latency within virtualized environments, they should plan to eliminate legacy switches that can be swamped with traffic spikes, and to implement switches with ample, dynamically adjustable buffers that can handle whatever the server and storage infrastructure require.

Force10 Networks develops high-performance data center solutions powered by the industry's most innovative line of open, standards-based, networking hardware and software. The company's Open Cloud Networking framework grants Web 2.0/portal operators, cloud and hosting providers, enterprise and special-purpose data center customers new levels of flexibility, performance, scale and automation -- fundamentally changing the economics of data center networking. For more information, visit

Read more about lan and wan in Network World's LAN & WAN section.