It's not the first time – Amazon is working to fix what it calls "power issues" at its Cloud data center in Virginia following a severe storm.
Friday’s outage wasn’t nearly as severe as the one that took out Amazon in April 2011. Then, a network update rolled across several data centers, causing widespread outages on the Amazon cloud.
So there are two big questions that need to be answered. First, why did Amazon’s data center fail?. Second, Why were companies like Instagram so drastically affected by a single data center outage?
So far, Amazon has given this explanation. “Severe thunderstorms caused us to lose primary and backup generator power to an Availability Zone in our east region overnight,” said Amazon spokeswoman Tera Randall on Saturday morning. “We have restored service to most of our impacted customers and continue to work to restore service for our remaining impacted customers.”
Mike Krieger, Instagram’s Co-Founder, once said why Instagram was such a success. “The cleanest solution with the fewest moving parts as possible.” Could that too-simple architecture have played a role here?
An enterprise cloud-computing product, Amazon Web Services (AWS) powers businesses in 190 countries worldwide.
Amazon began offering the AWS service in 2006. It lets businesses host apps and websites, backup and store data, and generally run their enterprise IT without having the Companies to worry about that side of the business — until something like this happens.
Aaron Levie, CEO of cloud services company Box talks about Amazon’s infrastructure-as-a-service model:
“At the end of the day, the cloud’s availability will come down to its physical infrastructure being available — it looks like Amazon’s data center in Virginia experienced a power failure, which knocked out a number of its systems there. For the applications built on top of Amazon, sometimes negative consequences from these events can cascade through your infrastructure (e.g. when one service goes down, it then overloads another service that was otherwise fine), and in other cases some apps just don’t have resilience for these events built into their software.
Rob Saurini, one of TechCrunch’s IT specialists, points out that AWS is cheap and usually reliable, which makes a compelling case for many companies:
“That’s the nature of relying on someone else for your website storage or application hosting. If your host goes down, so do you. Although AWS doesn’t go down too often, it might be prudent to have a backup that’s not based on AWS.
“The main selling point for AWS is that it’s cheap. Wicked cheap. It allows the little guy to compete with the big boys. Even a simple colocated server will cost upwards of $300 USD/month for a good one. AWS lets you have your data in more places at once a la carte, so you don’t have to pay for what your’e not using. It allows you to scale your app/website without worrying about infrastructure.”
The interruption underlined how businesses and consumers are increasingly exposed to unforeseen risks as they increasingly embrace life in the cloud. It was also a big blow to what is probably the fastest-growing part of the media business, start-ups on the social Web that attract millions of users seemingly overnight.
Amazon has built a thriving business in cloud computing, with a range of customers including Intercontinental Hotels, Fox Entertainment, Unilever, Spotify, as well as 187 government agencies and hundreds of small start-ups looking for the cheapest possible computing.
“The way companies view it is in terms of reliability generally,” said Michael Chui, a senior fellow at McKinsey & Company. Big customers of Amazon, he said, “have the opportunity to shape the marketplace and make demands that make products better. They will push for improvements.”
They will also have another option. Last Thursday, Google said it would offer computing over the Internet at half the price of Amazon.
The ability to deal with failures has long been a feature of any computing system, but like much else in the cloud, there are no common standards to guide how much protection against disaster is enough. Many start-ups appear not to take advantage of more expensive redundancy features in Amazon, like swapping data between the East Coast and West Coast Amazon facilities.
As one HackerNews commentator put it, “No matter how powerful we become as a species with our technology, we are still at the mercy of the clouds. Pretty cool if you think about it.”
Bigger companies are moving to the cloud as well, but may now look at Amazon Web Services just as a primary provider.
The outage to Instagram and other major sites shows that — despite massive hype in the Internet world — cloud computing isn’t necessarily a magic solution for businesses’ data and IT needs.
It also is a reminder that as everyone transitions from local storage to the cloud, that it is always a good idea to keep hard-copy back-ups of everything, because you never know what might take down the cloud.local storage