A well-publicized Netflix outage which took place over Christmas was blamed on accidental deletion of data by Amazon Web Services (News - Alert) (News - Alert) (AWS). It was tied to when an AWS developer ran electric load balancing (ELB) for maintenance and has been described by the company as “inadvertent.”
The first disruption took place about 3:24 p.m. EST on Dec. 24, and continued into Christmas Day.
“This process was run by one of a very small number of developers who have access to this production environment,” according to an AWS report. “Unfortunately, the developer did not realize the mistake at the time. After this data was deleted, the ELB control plane began experiencing high latency and error rates for API calls to manage ELB load balancers.”
The IT staff tried a few methods to find a recovery process. But, it was not until some 24 hours after the first disruption that service was restored and the load data was able to be recovered, according to Information Management.
Technicians first suspected the problem was related to API calls but later suspected it had to do with ELB data issues. Customers of Amazon’s CloudSearch, EC2 and Elastic Beanstalk cloud offerings in the U.S. eastern region data center were impacted by the load data incident.
In the future, AWS will use the data recovery method it discovered on Christmas if similar fixes are needed, and will reprogram its ELB control plane, Information Management said.
As many as 6.8 percent of ELB customers from the Virginia data center went down, and many lost service for a “prolonged period.”
What made it worse for Netflix is that customers couldn’t view movies during the holiday, which often is seen as a popular time to watch movies. “Netflix customers were outraged,” The Inquisitor commented.
This outage was not the only one in 2012. In October, several AWS data centers went out due to a bug related to data collection. A storm and power outage caused Amazon cloud customers to go offline in June and July. These two major outages included one on June 14, which related to problems with generators and electrical switching equipment, according to TMCnet. The later one related to severe thunderstorms on the East Coast. Instagram, Pinterest, WhatsYourPrice.com and Netflix users were all negatively impacted by the Amazon server crashes in June and July.
Edited by Brooke Neuman