Mitigating the risk of cloud services failure

One of CIOs' biggest concerns about the infrastructure-as-a-service model has been the loss of control over assets and management that enterprises might experience upon moving into a multi-tenant environment. While analysts and early adopters of infrastructure-as-a-service offerings have argued that such apprehension is rooted more in fear than fact, Amazon's recent public data center debacle has given everyone good reason to question the reliability of the public cloud.

The two-day outage may not slow the long-term growth of cloud computing significantly, but it should cause IT decision makers to take pause. Before rushing into any new cloud infrastructure deal, take the following seven steps to mitigate the risk of infrastructure-as-a-service failure.

1. Plan to fail. Develop detailed cloud breakdown scenarios and perform recovery run-throughs. "Put your risk-mitigation strategy firmly in place before moving into the cloud environment," says Phil Fersht, founder of outsourcing analyst firm HfS Research.

Heather McKelvey, vice president of engineering and operations for Mashery, an API management services provider, agrees. "A lot of people think 'get it up and running' and then we'll put in the design for failover," she says. "You can't do that. [Others] assume that a cloud will failover to another cloud-or one data center to another data center-but there are varying degrees of where problems can happen, and you need to architect and build for all levels of failure in your system, not just the high level."

2. Keep some expertise in house. One of the allures of cloudsourcing is the notion that you no longer have to maintain internal knowledge of the technologies that support as-a-service solutions. However, captive know-how comes in handy when you need to prepare for and react to cloud problems. "I don't see CIOs having much option but to increase in-house knowledge of cloud underpinnings," Fersht says.

If you lack in-house capabilities, ask your provider for help, or consider hiring consultants to create a disaster recovery and business continuity plan. "Even a small investment in third-party risk oversight is worth the investment, if it helps negate a potential disaster in the event of a long outage," Fersht says.

3. Test that plan. Then test it again. "The cloud is the perfect place to test failures in a completely staged environment," says Donald Flood, vice president of engineering for Bizo, a business-to-business advertising network provider and Amazon Web Services customer. "You can easily create a staged environment that mirrors production and test your systems by killing running services and evaluating how your system performs under failure."

4. Create internal back-up options. It took about two days for Amazon to locate and repair the problems at its data center in northern Virginia. But as soon as U.S. Tennis Association CIO and Amazon Elastic Cloud Compute (EC2) customer Larry Bonfante began to notice application sluggishness, he and his team migrated the USTA's critical systems to their own server. IT leaders must maintain internal contingency capabilities, Bonfante advises.

5. Reexamine your sourcing strategy. IT leaders have embraced multi-sourcing, but that model can make cloud continuity confusing. "The domino-effect ramifications of an outage are very complex to manage and resolve," says Fersht. For example, as more services get built on top of cloud computing infrastructures, a seemingly isolated outage can have a domino effect, taking down many services or an entire application environment, he adds.

Putting one service integration provider in charge of a multi-sourced arrangement will give you "one throat to choke" in the event of a failure, according to Fersht, but it can also prove problematic. "They are likely to develop an institutional knowledge of your IT processes that would be very tough to transfer in the future if you wanted to maintain a healthy competitive environment," Fersht says. "You need to have your own IT staff get smart about how cloud works, or you really do risk potentially losing control over your own IT environment."

6. Don't be cheap. The ROI of redundancy investments skyrockets in cloud collapse scenarios. Many of the companies affected by Amazon's failure could not-or would not-pay to run parallel systems in the cloud. Major Amazon Web Services customer Netflix, on the other hand, says it experienced no issues because its cloud computing model assumed one of the data centers in Amazon's four regions would go down. The company had "taken full advantage of Amazon Web Services' redundant cloud architecture," a Netflix spokesperson told The New York Times.

Critical data should be replicated across multiple availability zones (Amazon's term for a regional data centers) and backed up or live replicated across regions; active servers should be distributed geographically, and there should be enough active capacity to shift locations should one data center implode, advises Thorsten Von Eicken, CTO and co-founder of cloud management vendor RightScale. "Of course all this has costs, so each business needs to determine which costs are justified for each service being offered," he adds.

Bizo, for example, runs its services in two availability zones in each of the four Amazon regions it utilizes. When the eastern region went south last week, the company redirected that traffic to the western region.

7. Put your provider on the hook. Make sure your cloud vendors have some skin the game with a contract that ties outages to service levels. "If they [CIOs] are subcontracting to a third-party cloud provider, ensure they are responsible for these outages and can't [absolve] themselves of responsibility," Fersht says.

Read more about outsourcing in CIO's Outsourcing Drilldown.

Join the PC World newsletter!

Error: Please check your email address.

Tags business issuesManagement Topics | Outsourcingcloud servicescloud computing failureCloudbusinesscloud computinginternetInfrastructure as a ServiceManagement Topics

Our Back to Business guide highlights the best products for you to boost your productivity at home, on the road, at the office, or in the classroom.

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.

Stephanie Overby

CIO (US)
Show Comments

Essentials

Lexar® JumpDrive® S57 USB 3.0 flash drive

Learn more >

Microsoft L5V-00027 Sculpt Ergonomic Keyboard Desktop

Learn more >

Mobile

Lexar® JumpDrive® S45 USB 3.0 flash drive 

Learn more >

Exec

Audio-Technica ATH-ANC70 Noise Cancelling Headphones

Learn more >

Lexar® Professional 1800x microSDHC™/microSDXC™ UHS-II cards 

Learn more >

Lexar® JumpDrive® C20c USB Type-C flash drive 

Learn more >

HD Pan/Tilt Wi-Fi Camera with Night Vision NC450

Learn more >

Budget

Back To Business Guide

Click for more ›

Most Popular Reviews

Latest News Articles

Resources

PCW Evaluation Team

Azadeh Williams

HP OfficeJet Pro 8730

A smarter way to print for busy small business owners, combining speedy printing with scanning and copying, making it easier to produce high quality documents and images at a touch of a button.

Andrew Grant

HP OfficeJet Pro 8730

I've had a multifunction printer in the office going on 10 years now. It was a neat bit of kit back in the day -- print, copy, scan, fax -- when printing over WiFi felt a bit like magic. It’s seen better days though and an upgrade’s well overdue. This HP OfficeJet Pro 8730 looks like it ticks all the same boxes: print, copy, scan, and fax. (Really? Does anyone fax anything any more? I guess it's good to know the facility’s there, just in case.) Printing over WiFi is more-or- less standard these days.

Ed Dawson

HP OfficeJet Pro 8730

As a freelance writer who is always on the go, I like my technology to be both efficient and effective so I can do my job well. The HP OfficeJet Pro 8730 Inkjet Printer ticks all the boxes in terms of form factor, performance and user interface.

Michael Hargreaves

Windows 10 for Business / Dell XPS 13

I’d happily recommend this touchscreen laptop and Windows 10 as a great way to get serious work done at a desk or on the road.

Aysha Strobbe

Windows 10 / HP Spectre x360

Ultimately, I think the Windows 10 environment is excellent for me as it caters for so many different uses. The inclusion of the Xbox app is also great for when you need some downtime too!

Mark Escubio

Windows 10 / Lenovo Yoga 910

For me, the Xbox Play Anywhere is a great new feature as it allows you to play your current Xbox games with higher resolutions and better graphics without forking out extra cash for another copy. Although available titles are still scarce, but I’m sure it will grow in time.

Featured Content

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?