Amazon reveals simple mistake behind massive AWS cloud outage - Puget Sound Business Journal
Members of the Amazon Simple Storage Service team were debugging an issue causing the S3 billing system to progress more slowly than expected Tuesday morning. So the team attempted to take down a small number of servers for one of the subsystems that is used by the billing process.
“Unfortunately, one of the inputs to the command was entered incorrectly and a larger set of servers was removed than intended,” Amazon said. “The servers that were inadvertently removed supported two other S3 subsystems.”
The mistake had a cascading effect, leading to widespread problems with Amazon’s massive network of servers that are a huge part of the internet infrastructure. After the servers were accidentally taken offline, they had to be restarted, which takes a while, according to The Verge, which reported on Amazon’s explanation.
Websites and apps affected by the outage included the Securities and Exchange Commission, Business Insider, Quora and Slack.