Tuesday, July 24, 2012

Inevitable cloud outages

Few weeks ago market was boiling hot with news and analysis on Amazon EC2 outage. Microsoft Azure was no different. They faced an outage as well. I'd bet that smaller cloud providers are facing this more often but due to it's local nature - we do not see it much in media. There were number of blogs and analysis trying to figure out whether cloud can be more robust or redundant. Some of blogs were pitching hybrid clouds as a remedy. Some were asking for more governance improvements (as yes, most of outages were caused by human factor)
Let's face the truth: cloud infrastructure is designed to fail!
If we look at architecture designs, the most important aspect are scalability & cost - which are not going on par with redundancy I'm afraid. This does not mean that applications running on cloud will be impacted. It's really up to application developer to take into account cloud architecture and design application in a way that can cope with cloud outage. If we take for instance Amazon EC2 - they have number of mechanisms which  used properly shall provide robust applications running on EC2. Multi-region zones, availability zones, load balancers etc. There's very nice whitepaper describing how to build fault tolerant applications on AWS. As we can see - responsibility shifted from infrastructure provider into application developer - and this is major change that comes with cloud.
If we take legacy datacenter application and we try to move it into the cloud - we will have very unpleasant surprise. Legacy applications are mainly designed with assumption that they're running on redundant infrastructure. Of course some of them are clustered in order to withstand single server failure but in most of the cases legacy applications are only vertically scalable monoblocks. Cloud applications are different beasts. They're horizontally scalable entities and this makes a difference. When we add distributed load balancers that balance traffic into different regions - we should be safe when single availability zone or even region goes down. Of course it comes at price but it's different story ;)
Having this distinction in mind - I very often characterize cloud as two different flavors: commodity cloud and enterprise class cloud. The first one is targeted for developers who need to design applications with given cloud architecture in mind. Commodity cloud infrastructure is built in a very specific way and it's designed to fail. Enterprise class cloud is bit different. It uses the same consumption & operational model ("as a service, on demand. self service") but it's designed to host legacy applications in it. It's infrastructure is redundant and maybe less scalable but more robust for sure.
If you're interested how commodity clouds are being built, there are nice resources on www.referencearchitecture.org - especially networking part - as in fact it's network that makes a difference. There's very good lecture from one of OpenStack Summits: Discover Diablo Networking Mode

Summarizing, commodity clouds are designed to fail - but it does not mean that it's something bad. We simply have responsibility shift - which is now on developer to cope with it. It's like power supply to our home. Who does have redundant cables from separate power supply companies coming to your home? Likely no one. It's upon us to secure continuity for our servers at home, hence we buy UPS'es. Cloud is utility. Let's face it :)

No comments:

Post a Comment