Like the fashion industry with its cyclical styles, the IT industry reinvigorates buzzwords, albeit in at a quicker pace. Ever since the latest round of hurricanes, Disaster Recovery is at the forefront of everyone's mind again. With the widespread acceptance of virtualization, people are finally doing disaster recovery differently.
In times past, true high-availability and disaster recovery meant purchasing two of everything. Not to mention a separate facility to house the carbon copies. Many companies became quite profitable selling their server cloning services, and most businesses decided that 100 percent immediate failover just wasn't important.
Often, business will only configure immediate failover for their most critical servers. This involves buying duplicate hardware, and devising a mechanism for real-time data replication. Data replication may involve extreme amounts of bandwidth as well, adding to the myriad of costs. Remaining servers, the less important ones, can be configured in a "quick failover" setup, or even delegated to being reliant on recovering from the last backup. In a quick failover configuration, data replication happens at acceptable intervals. Bringing up a replacement server could take a few hours, and some data can be missing. The vast majority of businesses only require immediate failover for very few servers, if any, so this type of redundancy is normally sufficient.
Fast recovery is expensive too. Even if you aren't replicating data so that immediate failover is possible, you're still replicating data to another server at fairly frequent intervals, and that requires backup copies of hardware.
High availability is never easy to accomplish. There's always one more avenue of failure, one more solution to becoming redundant. Many people have settled with knowing that catastrophic failure doesn't happen often enough to warrant complete clones of every data center. It's ironic that a hurricane season will get people thinking about disaster recovery, but the fact is that very few companies will be completely operational if their main data center disappears. Yet, backups still take place to enable a slow recovery from such instances.
What if it wasn't expensive though? Most companies have come to realize that short outages are acceptable; hence there is less focus on always-available failover solutions. Virtualization doesn't really change that, but it does make fast failover easier, cheaper, and more feasible.
The question when designing any type of disaster recovery is "what disaster?" Does your entire infrastructure need to operate if the primary data center explodes? Or is it just a few services? Perhaps you don't care what happens when the data center combusts: you've got bigger things to worry about. In any case, the answer could lie within virtualized backup servers. Prohibitive hardware costs have traditionally hindered large-scale disaster recovery plans, but not anymore.
With OS-level virtualization, you can replicate four to five existing servers onto one physical server. In fact, you may even be able to consolidate your production infrastructure using virtualization, but that's another topic all together. People who may have experimented with different types of virtualization in the past are probably holding their nose right now, but we assure you that OS-level virtualization is feasible.
Solaris Containers (AKA, Zones), Linux-Vserver, Xen, and Virtuozzo all offer bare-metal performance. Initial whacks at virtualization were generally running non-native code on dissimilar hardware, and the hardware had to be emulated in software. Then came some interesting hypervisors that would allow similar (in terms of computer architecture) OSes to run much closer to the metal. The operating systems can, with modifications, actually run on the real CPU, but performance still doesn't compare to the OS having its own private machine.
In OS-level virtualization the kernel is normally shared between multiple OS instances running in parallel. Performance doesn't suffer, since all processes share the same hardware. Servers nowadays are coming with mind-blowing CPU speeds (of 2, 4, 6, and 8 core varieties), and they also include at least 4-16 GB of RAM. They are prepared for some very heavy workloads.
What this means for disaster recovery and failover is that you can implement multiple operating systems on one server. If you're conservative in the allocation, performance will not suffer. Unless you've got completely loaded and overworked servers already, which wouldn't be performing well in that state, then combining two to three servers on one machine should be completely feasible.
But, and most people are in this camp, if you can accept degraded service during a failover mode, oftentimes all failover servers can run on one (or few) physical machine. Depending on the number of machines that need to remain operational, and the size of the business, live failover is now possible for everyone, and quite cheaply.
OS-level virtualization in a SAN-based environment is also very helpful for quick failover in the case of dead servers. If a single point of failure, like a non-clustered file server, decides to go out to lunch, operations can cease for up to a few hours or even longer. If said server is backed up by an virtual OS running on another SAN-attached server, downtime is only the amount of time it takes to switch IP addresses on the virtual server and share file systems. Some fairly advanced SAN fabrics can aid in this process as well.
Virtualizing failover and other types of disaster recovery is beneficial to all. The really large companies can reduce backup server costs tremendously, while the smaller ones now have the ability to implement some failover, because it only take a few extra machines.