Amazon cloud outage was triggered by configuration error

29.04.2011

Amazon also said in its explanation of the outage that it will work to ensure that it builds software and services that can survive failures.

Matt Stevens, the CTO of AppNeta, a cloud performance network performance management company and an Amazon cloud user, praised Amazon's postmortem for its transparency. "As a technical architect, I thought it was actually amazing how deep they went into it," said Stevens, adding that he wished the company had offered more detail about the initial network change that started the problem.

In terms of the overall issue, Stevens said: "How does anybody who runs their own private data center know how it's going to hold up until you have a massive issue?"

Jim Damoulakis, CTO of GlassHouse Technologies, an enterprise storage services provider, called it "a pretty through postmortem and I think for the most part they are being transparent about it."

Damoulakis said that while Amazon will take steps to keep the problem from happening again -- and to make their availability zones more robust -- customers will ultimately be responsible for having a good disaster recovery plan.