Feb 17

Amazon S3 Failure

icon1 kshirley | icon2 Uncategorized | icon4 February 17, 2008| icon32 Comments »

Amazons S3 system suffered a serious failure yesterday and the details are now becoming clearer. Basically they had an issue where the level of requests being made for authentication outstripped their capacity. There are a couple of things that I find interesting about this case.

The S3 service is basically a storage service that Amazon rents out – and at quite good rates. Everything is stored in their data systems and they provide a software interface (API) that allows you to store your files on there. You pay for the space and the bandwidth used accessing that file.

I had reviewed it a long time ago with regards to potential backup solutions but found it unsuitable for Shercom’s needs. There are a number of companies that use it to provide download services for their products or for storing catalogue systems. It allows you the get access to major infrastructure but to only pay for what you use.

Placing all your data in someone else’s hands can feel like a major leap of faith – we have it all the time for Shercom’s Online Backups. I have no doubt that this will hurt Amazon’s image but I do encourage people to review their needs carefully and not to dismiss the service purely because of this issue.

Amazon have done a good job in a number of areas with this incident:

- They had the monitoring system in place to at least notice the issues before they became critical.
- They reacted quickly to add extra capacity but were unable to complete it in time.
- They accepted responsibility for the issue.

I raise the last point because so many companies seem to try and bluff their way through incidents and not keep the customer informed. Skype had a similar outage a while back. I didn’t investigate it in detail but I do remember it took quite a while to get some clear answers as to what had happened  -and what was being done to ensure it didn’t happen again.

The Amazon S3 incident was fixed in less than 2.5 hours – which is an eternity when you are trying to run a business critical application but it could have been much worse.

Click on one of the options below to share and enjoy:
  • Digg
  • del.icio.us
  • Facebook
  • Google Bookmarks
  • email
  • kick.ie
  • LinkedIn
  • MySpace
  • Ping.fm
  • Pownce
  • Print
  • Reddit
  • Slashdot
  • Spurl
  • StumbleUpon
  • Technorati
  • Tumblr
  • TwitThis
  • Yahoo! Buzz
  • YahooMyWeb


2 Responses

  1. Amazon S3 Failure Says:

    [...] Another post I wrote previously about problems with Amazons S3 service. [...]

  2. Keith Shirley Says:

    It seems Amazon has been having a whole new set of problems. The knock on is much higher however as more and more businesses are building on Amazon’s platform.

    http://www.centernetworks.com/amazon-s3-down-july-2008

Leave a Comment

Subscribe without commenting