December 15, 2014

The disruptive nature of the cloud - The Fallacy of Availability through Reliability

So, as the title of this post suggests, I want to discuss the disruptive nature of the Cloud. This will be a series of 5 posts in total, you're reading the last post.

Read about the cloud and what it means and you're bound to read that the introduction of the Cloud, the real Cloud, the one that meets all criteria for being a Cloud, has been disruptive, but has it?.

I like the NIST definition of Cloud, it is quite comprehensive and less vendor-biased than Gartner's defintion.

The Cloud has been disruptive when it comes to the IT industry, especially the hosting market. But it has also been a force in how we handle IT within the enterprise. There are a few important aspects of IT in the enterprise that we consider to have changed due to the Cloud:

  • Moving from in-house IT management services to off-site IT management services. 
  • Moving from CAPEX (Capital Expenses) based IT investments to OPEX (Operating Expenses). 
  • Moving from on-premise (business) applications to off-premise (hosted) applications. 
  • Moving from a centralized IT to a democratized IT 

  • I'm sure you can think of other movements as well in your IT environment, but these are typically considered to be not only happening, but also to be disruptive within the enterprise.
    These are in fact not really changes happening due to the Cloud, the Cloud merely gave these movements a boost and fast-tracked the changes in IT. The point in case, as you can read in above articles, is that the cloud hasn't been that disruptive at all. It's more or less all the same all over again.

    But there's an actual disruptive nature to the Cloud, it's got everything to do with the many 9s you read about in brochures of Cloud providers. The amount of numbers of 9 relates to the stability or rather the availability of of something. It's the percentage of up-time of something, preferably a service, but in many cases it's just about a server.
    The disruptive part is in that traditionally the availability of a service or even a server is depending on the stability of the service or server for that matter. And what is traditionally the case is that the availability of a service, an application, is actually depending on the stability of the infrastructure on which the application is running. The more reliable the infrastructure, the more available the application and therefore the service is.
    As enterprises controlled the infrastructure, even in a hosted environment, applications were developed that relied on how the infrastructure was realized.
    And that's where the disruptive part comes into play, because in the cloud, the enterprise no longer controls the infrastructure, and by no means one can depend on its reliability.

    The bottom line here is that traditionally applications are designed-for-success. The application is relying on the fact that the infrastructure can be depended on and that in essence, the infrastructure will not fail. In the cloud this is not the case, applications need to be designed-for-failure.

    Back to why this is the case. It's quite simple. Clouds are consisting of massive amounts of infrastructure. This is because the Cloud provider wants to achieve an economy of scale such that it can offer infrastructure and consequently services on that infrastructure for as many customers as possible for as low a price as possible. In order to do this, it makes more sense to use cheap hardware, so costs are low instead of expensive hardware. Because, well 100,000 servers of US$ 2,500 each is more expensive than those same 100,000 servers costing each US$800. The price per server for a customer can be lower in the latter case... you get the economics.
    But as things go with cheap stuff, it breaks down all the time. But spare parts are cheap as well. The Cloud provider just needs to make sure that the fixed or new server is up an running again in no time. Oh, and when you run out of spare parts, you just buy other cheap kit.
    With virtualization coming into play, a piece of hardware going belly up means hardly anything as the virtual server can be back up on another piece of cheap kit within literally minutes or even seconds.
    Enterprises buy expensive hardware because they want reliable hardware as they're typically too small to get actual economies of scale so the whole Cloud paradigm doesn't work.

    Here's the pretty part, when you control the infrastructure, you can determine what part of the availability you want to handle within the infrastructure and what's in the application. You have access to the complete OSI stack, so you decide where what is handled.
    Now forget about being in control of the complete stack, you're only controlling layer 7, the application layer. Or when you're lucky, which you won't, you have something to say about the 6th layer, the presentation layer. All of a sudden, all the availability requirements will have to be handled in layer 7. That's right, in the application. Because all you know is that 99.999% of the time (probably more likely is 99.8% of the time) the infrastructure will do what you want it to do, but when it doesn't. You have no clue, you just know that it'll be back up in seconds. Probably not knowing what it was doing before it crashed. In fact, Cloud providers will not tell you how they reach the many 9s from their brochures, you can sue them when they don't come true on their promises.

    What's so disruptive about this? Well, ever since businesses started using computers, applications and more importantly the design and programming models could stay the same going from one computing model to another. Always the paradigm has been design-for-success. Granted there was error handling, but this was handling errors that could be predicted. Wrong user input, key-uniqueness violations, race-conditions. But with the Cloud, all of a sudden, the paradigm must be design-for-failure. Assume it won't work and make the application robust enough to handle failure, handling errors that cannot be predicted. Unavailability of resources, session losses at inconvenient times, state of the infrastructure changing all the time without the application being aware of it.
    See? The issue here is that applications can't be migrated from one computing model to another by just adapting the model when you go to the Cloud. All of a sudden, the application model will most likely have to change significantly. And with that, your breed of developers need to change as well. They have to re-learn how to program enterprise grade software. And no, just creating stateless horizontally scaling applications doesn't cut it. Because unless you design for failure, your application won't be horizontally scaling across the board. It will always have a bottleneck at the ESB (see my previous post on why the ESB doesn't scale, by design) or the database. In fact, the scaling capabilities of an application has nothing to do with the Cloud, it's a traditional problem that surfaced with the rapid increase of computing utilization. More users, more transactions, more requests, more of everything required different kinds of scaling than the traditional vertical (faster CPU, more memory, more bandwidth) kind. This is not related to the Cloud at all. In fact, the scaling solutions we all know is also heavily relying on the reliability of the infrastructure.

    Concluding, the Cloud is disruptive. It's disruptive in that it needs redesign of the applications that are migrating from the on-premise to the Cloud.

    Okay, there's another thing that's different from traditional computing models, and I hinted on that already. You have no clue as how things are done at the Cloud provider, you have no say about it, and the Cloud provider will never tell you. And that's something that a lot of enterprises have to get used at. You have to trust your computing resources vendor at face-value that you get what you're paying for, and you pay for the result and not for how it's done. And that's disruptive for a lot of managers and architects, especially because these are typically the control-freak kind of people.

    NB: The Cloud needs economies of scale, most enterprise's IT environments are not big enough to reach these economies of scales, thus an on-premise private Cloud makes no sense from a cost perspective. This is not to say that your own Cloud is a no-go.

    No comments:

    Post a Comment

    Note: Only a member of this blog may post a comment.