AWS comes clean about recent Sydney outage

Public Cloud provider said both primary and backup power failed

Amazon Web Services has highlighted the issues behind its power-related outage in its Sydney availability zone on Sunday night.

At 4pm Sydney time, the company reported the power issue at its Sydney region datacentres delivering its EC2 and S3 services.

The blackout lead to disruption for Sydney citizens and AWS clients as the outage took out major websites such as Foxtel Play, Channel Nine, Domain and Domino’s Pizza.

Partners including Comunet, Bulletproof, RXP Services and Strut Digital were also affected as many worked through the night with clients to work through business-critical challenges.

In a recent blog post, AWS explained how every instance is served by the main utility power and a backup generator diesel rotary uninterruptible power supply (DRUPS), as two independent power delivery sources.

AWS said if either source provides power, the instance will maintain availability as the DRUPS as the secondary source, stores power and starts up if the main utility power is compromised.

However, during the severe weather, the instances that lost power lost access to both primary and secondary powers and consequently, the backup generator could not start up.

AWS described the power failure as an ‘unusually long voltage sag’, as opposed to ‘a complete outage’ and said that the unexpected nature of the voltage sag caused the set of breakers responsible for isolating the DRUPS from utility power, fail to open fast enough.

“Normally, these breakers would assure that the DRUPS reserve power is used to support the datacenter load during the transition to generator power. Instead, the DRUPS system’s energy reserve quickly drained into the degraded power grid,” the company explained.

“The rapid, unexpected loss of power from DRUPS resulted in DRUPS shutting down, meaning the generators which had started up could not be engaged and connected to the datacenter racks. DRUPS shutting down this rapidly and in this fashion is unusual and required some inspection.”

In remediation, AWS said it will add additional beakers to assure a quicker break to connections to degraded utility power to allow the generators to activate before the UPS systems are depleted.

The company added that it will also make fixing the ‘latent bug’ that disabled the automatic recovery systems in customer instances, a priority.

AWS said more than 80 per cent of the impacted customer instances and volumes were online and operational by 1 am PDT after power was restored at 11:46 am PDT.

According to Comunet chief executive, Mark Ogden, 100 of his clients in total were affected and issues across all clients, bar one, were resolved in three hours.

However, this was not the case for all. Strut Digital chief executive, Zack Levy, told ARN that his engineers were still restoring services at 3:30 am.

“We apologise for any inconvenience this event caused. We know how critical our services are to our customers’ businesses. We are never satisfied with operational performance that is anything less than perfect, and we will do everything we can to learn from this event and use it to drive improvement across our services,” AWS said.

AWS channel partners recently told ARN the interruption has proven that business should consider reviewing their architecture model and strategy before considering jumping on the Cloud bandwagon.


Join the newsletter!

Error: Please check your email address.
Rocket to Success - Your 10 Tips for Smarter ERP System Selection

Tags domainZack LevyWestpacAWSStrut DigitalATMcommonwealth bankrxp services

Keep up with the latest tech news, reviews and previews by subscribing to the Good Gear Guide newsletter.
Holly Morgan
Show Comments

Most Popular Reviews

Latest Articles

Resources

PCW Evaluation Team

Ben Ramsden

Sharp PN-40TC1 Huddle Board

Brainstorming, innovation, problem solving, and negotiation have all become much more productive and valuable if people can easily collaborate in real time with minimal friction.

Sarah Ieroianni

Brother QL-820NWB Professional Label Printer

The print quality also does not disappoint, it’s clear, bold, doesn’t smudge and the text is perfectly sized.

Ratchada Dunn

Sharp PN-40TC1 Huddle Board

The Huddle Board’s built in program; Sharp Touch Viewing software allows us to easily manipulate and edit our documents (jpegs and PDFs) all at the same time on the dashboard.

George Khoury

Sharp PN-40TC1 Huddle Board

The biggest perks for me would be that it comes with easy to use and comprehensive programs that make the collaboration process a whole lot more intuitive and organic

David Coyle

Brother PocketJet PJ-773 A4 Portable Thermal Printer

I rate the printer as a 5 out of 5 stars as it has been able to fit seamlessly into my busy and mobile lifestyle.

Kurt Hegetschweiler

Brother PocketJet PJ-773 A4 Portable Thermal Printer

It’s perfect for mobile workers. Just take it out — it’s small enough to sit anywhere — turn it on, load a sheet of paper, and start printing.

Featured Content

Product Launch Showcase

Latest Jobs

Don’t have an account? Sign up here

Don't have an account? Sign up now

Forgot password?