Lightning strike takes Amazon’s cloud offline
Some Amazon cloud customers could have to wait two days before services come back online after a weekend lightning strike in Ireland knocked servers out.
According to Amazon, the strike hit a transformer at a utility provider and sparked an explosion and fire that interrupted power and the back-up generators failed to deal with the problem, leaving its Elastic Compute Cloud (EC2) down.
The latest update from the company explained that some customers will have to wait another 48-hours for services to be returned following the strike, which reportedly caused problems for Microsoft’s Business Productivity Online Suite – the predecessor to its new Office 365.
“Normally, upon dropping the utility power provided by the transformer, electrical load would be seamlessly picked up by backup generators,” Amazon said in its service status update.
Due to the scale of the power disruption, a large number of EBS servers lost power and require manual operations before volumes can be restored
“The transient electric deviation caused by the explosion was large enough that it propagated to a portion of the phase control system that synchronises the backup generator plant, disabling some of them,” Amazon said.
In a stark warning to companies relying on cloud services for critical business, the company said it had restored power to the main centre Availability Zone, but was still recovering service on EC2 servers as it dealt with capacity issues that meant many companies were unable to access databases.
Amazon said it had to manually update individual servers, which requires a back-up of all data, with a capacity shortage making matters worse.
“Due to the scale of the power disruption, a large number of EBS [elastic block storage] servers lost power and require manual operations before volumes can be restored,” the company said.
“Restoring these volumes requires that we make an extra copy of all data, which has consumed most spare capacity and slowed our recovery process.”
Amazon said it was adding additional capacity and switching capacity from other regions, but admitted some customers would be without services for up to two days, and may still have file issues to resolve before they are back up and running normally.
“While many volumes will be restored over the next several hours, we anticipate that it will take 24-48 hours until the process is completed,” the company said.
“In some cases, EC2 instances or EBS servers lost power before writes to their volumes were completely consistent. Because of this, in some cases we will provide customers with a recovery snapshot instead of restoring their volume so they can validate the health of their volumes before returning them to service.”