4. I/O virtualization
I/O virtualization addresses an issue that plagues servers running virtualization software such as VMware or Microsoft Hyper-V. When a large number of virtual machines runs on a single server, I/O becomes a critical bottleneck, both for VM communication with the network and for connecting VMs to storage on the back end. I/O virtualization not only makes it easier to allocate bandwidth across multiple VMs on a single server, it paves the way to dynamically managing the connections between pools of physical servers and pools of storage.
But let's start with the individual server. Take, for example, VMware's recommendation to allocate one gigabit Ethernet port per VM. A server that supports 16 VMs would therefore need four four-port gigabit Ethernet NICs, plus additional Ethernet (iSCSI), SCSI, or Fibre Channel adapters for the necessary storage. Many servers don't have enough empty slots to support that many adapters, even if the cooling capacity were adequate. And 16 VMs per host is barely pushing it, considering that today's Intel and AMD servers pack anywhere from 8 to 24 cores and support hundreds of gigabytes of RAM. Consolidation ratios can go much higher.
[ I/O virtualization is key to the highly scalable architecture of Cisco's Unified Computing System. See "Test Center review: Cisco UCS wows." ]
In response, I/O virtualization vendors such as Xsigo and Cisco have come up with a way to give each server one very high-speed connection instead of multiple Ethernet and Fibre Channel connections. One adapter per server can then provide many virtual connections. These adapters are not custom HBAs, but standard 10 gigabit InfiniBand or Ethernet adapters used with drivers in the OS that let the OS treat the single fast connection as multiple network and storage connections. Since everything is running over a single pipe, the system can grant bandwidth to the virtual connections as needed, providing maximum performance where appropriate.
Typically, a single adapter resides in each server, connected by a single cable to the appliance or switch, which then provides both network and storage ports to connect to storage and other networks. This simplifies datacenter cabling, as well as the installation of each server. It also eases the task of transferring adapters to another system if a server fails. In solutions such as Cisco UCS, I/O virtualization makes server provisioning, repurposing, and failover extremely flexible and potentially completely automated, as it's handled entirely in software. Further, because the I/O virtualization systems can emulate either multiple Ethernet or Fibre Channel connections running at varying speeds, available bandwidth can be tailored to the requirements of VM migration or other heavy loads.
Virtualizing I/O does require drivers that support the specific OS in use. The major operating systems and virtualization platforms are supported, including VMware ESX and Windows Server 2008 Hyper-V, but not necessarily all versions of Linux and Xen or other open source virtualization platforms. If you're using supported OSes, I/O virtualization can make running a large datacenter much simpler and far less expensive, particularly as increased processing power and memory support allow servers to handle vaster numbers of virtual machines.
-- Logan Harbaugh 3. Data deduplication
Data is the lifeblood of any business. The problem is what to do with all of it. According to IDC, data in the enterprise doubles every 18 months, straining storage systems to the point of collapse. The blame for this bloat often falls on compliance regulations that mandate the retention of gobs of messages and documents. More significant, though, is that there's no expiration date on business value. Analyzing data dating back years allows users to discover trends, create forecasts, predict customer behavior, and more.
Surely here must be a way to reduce the immense storage footprint of all of this data, without sacrificing useful information. And there is, thanks to a technology known as data deduplication.
Every network contains masses of duplicate data, from multiple backup sets to thousands of copies of the employee handbook to identical file attachments sitting on the same e-mail server. The basic idea of data deduplication is to locate duplicate copies of the same file and eliminate all but one original copy. Each duplicate is replaced by a simple placeholder pointing to the original. When users request a file, the placeholder directs them to the original and they never know the difference.
Deduplication takes several forms, from simple file-to-file detection to more advanced methods of looking inside files at the block or byte level. Basically, dedupe software works by analyzing a chunk of data, be it a block, a series of bits, or the entire file. This chunk is run through an algorithm to create a unique hash. If the hash is already in the index, that means that chunk of data is a duplicate and doesn't need to be stored again. If not, the hash is added to the index, and so on.
Data deduplication isn't just for data stored in a file or mail system. The benefits in backup situations, especially with regard to disaster recovery, are massive. On a daily basis, the percentage of changed data is relatively small. When transferring a backup set to another datacenter over the WAN, there's no need to move the same bytes each and every night. Use deduplication and you vastly reduce the backup size. WAN bandwidth usage goes down and disaster recovery ability goes up.
More and more backup products are incorporating data deduplication, and deduplication appliances have been maturing over the past few years. File system deduplication is on its way too. When it comes to solving real-world IT problems, few technologies have a greater impact than data deduplications.
-- Keith Schultz