Virtualization is one of the hottest topics in the IT industry, and for good reason. Server virtualization brings many benefits: hardware consolidation, better resource utilization, lower capital and operating expenses, and greater flexibility to meet changing business needs.
Like most new technologies, server virtualization also brings new challenges, one of which is to protect against unplanned downtime. Virtualization dramatically increases the need for rock-solid availability, even more so than on physical servers.Why? Because server consolidation results in the server becoming a single point of failure for multiple applications.
For pervasive deployment of virtualization to occur, IT departments and line-of-business customers must be confident that applications are protected. When a business unit requests a new application be started, the business unit generally expects it to be on its own system with its own dedicated hardware. Tell the business unit it will be stacked on a server running five other applications and the group is likely to get uneasy. Migrating applications to virtual environments requires higher standards for availability, security and manageability.
A common misconception is the live migration functionality included with the leading virtual server products addresses availability requirements. Live migration allows a virtual machine to be moved between physical servers manually, but not automatically upon a failure. While live migration works well for managing planned downtime, it's not designed for, and doesn't provide, protection against failures and unplanned events.
Unfortunately, most of today's offerings for protecting virtual machines against unplanned downtime are retrofits of traditional clustering/failover products used on physical servers -- approaches that are complex, costly and often unreliable. Challenging enough on physical servers, applying them to virtual servers compounds the problem dramatically.
Just as important as the need for high availability in virtual environments is the ease in which virtual machines can be deployed, configured and managed. Consider a dozen servers each with five virtual machines, each running a different application. That's 60 systems you have to protect, manage and maintain. With traditional clustering and failover solutions, you would have to create another 60 standby systems, all of which would need to be managed as well. Then you would have to configure replication, heartbeats and restarts, and of course it all would need to be thoroughly tested. It's easy to see how complex this can get.
Bridging the gap
So how do you protect against failures without the interruptions and hassles of failovers? Fault tolerance. True availability requires fault-tolerant-class capabilities to deliver continuous computing, even when failures occur. Fault tolerance means that if something should break, the operating environment and associated applications don't stop -- not even briefly. Failing over every time a failure occurs is costly in terms of data, time and application availability. Failing out a failed device maintains continuity of operations. This provides a proactive approach to enable continuous availability, unlike reactive failover/restart techniques.
Fault-tolerant-class software for virtual environments transparently combines and manages the resources of two virtual machines running on different servers in a virtual resource pool to create a single protected virtual-machine environment. The protected virtual machine appears and is managed just like a standard Windows server. Disk data is mirrored synchronously to redundant storage, and network and server operations are protected from failure. The administrator loads and configures the application in the protected virtual machine as though it were being loaded onto a physical server.
If a fault or failure occurs in a disk, network device or host system, fault-tolerant-class software can automatically reconfigure resources to permit the application to continue operating without interruption or loss of client connectivity. Failure recovery is always reliable: With active redundancy the system is constantly validating itself because the two virtual machines comprising the protected environment continuously execute the same application operations. This unique architecture assures the availability and correct operation of system resources when they are needed in the event of a failure.