December 15, 2014

The disruptive nature of the cloud - The Fallacy of Availability through Reliability

So, as the title of this post suggests, I want to discuss the disruptive nature of the Cloud. This will be a series of 5 posts in total, you're reading the last post.

Read about the cloud and what it means and you're bound to read that the introduction of the Cloud, the real Cloud, the one that meets all criteria for being a Cloud, has been disruptive, but has it?.

I like the NIST definition of Cloud, it is quite comprehensive and less vendor-biased than Gartner's defintion.

The Cloud has been disruptive when it comes to the IT industry, especially the hosting market. But it has also been a force in how we handle IT within the enterprise. There are a few important aspects of IT in the enterprise that we consider to have changed due to the Cloud:

  • Moving from in-house IT management services to off-site IT management services. 
  • Moving from CAPEX (Capital Expenses) based IT investments to OPEX (Operating Expenses). 
  • Moving from on-premise (business) applications to off-premise (hosted) applications. 
  • Moving from a centralized IT to a democratized IT 

  • I'm sure you can think of other movements as well in your IT environment, but these are typically considered to be not only happening, but also to be disruptive within the enterprise.
    These are in fact not really changes happening due to the Cloud, the Cloud merely gave these movements a boost and fast-tracked the changes in IT. The point in case, as you can read in above articles, is that the cloud hasn't been that disruptive at all. It's more or less all the same all over again.

    But there's an actual disruptive nature to the Cloud, it's got everything to do with the many 9s you read about in brochures of Cloud providers. The amount of numbers of 9 relates to the stability or rather the availability of of something. It's the percentage of up-time of something, preferably a service, but in many cases it's just about a server.
    The disruptive part is in that traditionally the availability of a service or even a server is depending on the stability of the service or server for that matter. And what is traditionally the case is that the availability of a service, an application, is actually depending on the stability of the infrastructure on which the application is running. The more reliable the infrastructure, the more available the application and therefore the service is.
    As enterprises controlled the infrastructure, even in a hosted environment, applications were developed that relied on how the infrastructure was realized.
    And that's where the disruptive part comes into play, because in the cloud, the enterprise no longer controls the infrastructure, and by no means one can depend on its reliability.

    The bottom line here is that traditionally applications are designed-for-success. The application is relying on the fact that the infrastructure can be depended on and that in essence, the infrastructure will not fail. In the cloud this is not the case, applications need to be designed-for-failure.

    Back to why this is the case. It's quite simple. Clouds are consisting of massive amounts of infrastructure. This is because the Cloud provider wants to achieve an economy of scale such that it can offer infrastructure and consequently services on that infrastructure for as many customers as possible for as low a price as possible. In order to do this, it makes more sense to use cheap hardware, so costs are low instead of expensive hardware. Because, well 100,000 servers of US$ 2,500 each is more expensive than those same 100,000 servers costing each US$800. The price per server for a customer can be lower in the latter case... you get the economics.
    But as things go with cheap stuff, it breaks down all the time. But spare parts are cheap as well. The Cloud provider just needs to make sure that the fixed or new server is up an running again in no time. Oh, and when you run out of spare parts, you just buy other cheap kit.
    With virtualization coming into play, a piece of hardware going belly up means hardly anything as the virtual server can be back up on another piece of cheap kit within literally minutes or even seconds.
    Enterprises buy expensive hardware because they want reliable hardware as they're typically too small to get actual economies of scale so the whole Cloud paradigm doesn't work.

    Here's the pretty part, when you control the infrastructure, you can determine what part of the availability you want to handle within the infrastructure and what's in the application. You have access to the complete OSI stack, so you decide where what is handled.
    Now forget about being in control of the complete stack, you're only controlling layer 7, the application layer. Or when you're lucky, which you won't, you have something to say about the 6th layer, the presentation layer. All of a sudden, all the availability requirements will have to be handled in layer 7. That's right, in the application. Because all you know is that 99.999% of the time (probably more likely is 99.8% of the time) the infrastructure will do what you want it to do, but when it doesn't. You have no clue, you just know that it'll be back up in seconds. Probably not knowing what it was doing before it crashed. In fact, Cloud providers will not tell you how they reach the many 9s from their brochures, you can sue them when they don't come true on their promises.

    What's so disruptive about this? Well, ever since businesses started using computers, applications and more importantly the design and programming models could stay the same going from one computing model to another. Always the paradigm has been design-for-success. Granted there was error handling, but this was handling errors that could be predicted. Wrong user input, key-uniqueness violations, race-conditions. But with the Cloud, all of a sudden, the paradigm must be design-for-failure. Assume it won't work and make the application robust enough to handle failure, handling errors that cannot be predicted. Unavailability of resources, session losses at inconvenient times, state of the infrastructure changing all the time without the application being aware of it.
    See? The issue here is that applications can't be migrated from one computing model to another by just adapting the model when you go to the Cloud. All of a sudden, the application model will most likely have to change significantly. And with that, your breed of developers need to change as well. They have to re-learn how to program enterprise grade software. And no, just creating stateless horizontally scaling applications doesn't cut it. Because unless you design for failure, your application won't be horizontally scaling across the board. It will always have a bottleneck at the ESB (see my previous post on why the ESB doesn't scale, by design) or the database. In fact, the scaling capabilities of an application has nothing to do with the Cloud, it's a traditional problem that surfaced with the rapid increase of computing utilization. More users, more transactions, more requests, more of everything required different kinds of scaling than the traditional vertical (faster CPU, more memory, more bandwidth) kind. This is not related to the Cloud at all. In fact, the scaling solutions we all know is also heavily relying on the reliability of the infrastructure.

    Concluding, the Cloud is disruptive. It's disruptive in that it needs redesign of the applications that are migrating from the on-premise to the Cloud.

    Okay, there's another thing that's different from traditional computing models, and I hinted on that already. You have no clue as how things are done at the Cloud provider, you have no say about it, and the Cloud provider will never tell you. And that's something that a lot of enterprises have to get used at. You have to trust your computing resources vendor at face-value that you get what you're paying for, and you pay for the result and not for how it's done. And that's disruptive for a lot of managers and architects, especially because these are typically the control-freak kind of people.

    NB: The Cloud needs economies of scale, most enterprise's IT environments are not big enough to reach these economies of scales, thus an on-premise private Cloud makes no sense from a cost perspective. This is not to say that your own Cloud is a no-go.

    December 1, 2014

    When Scrum needs to mature in the enterprise, architecture comes to the rescue.

    This post is the consequence of a comment on my previous post that startled me as it stated that in Scrum there's no place for the architect. Scrum has no future were it not for the architect.

    First of all, I am not a fan of Scrum, mainly because it sounds too much like an acronym and I hate acronyms. The other reason why I don't like Scrum is because it entices too many religious zealots to start a jihad against all non-believers. And then there's the third reason, the reason why I don't like Scrum is because too many so-called Scrum practitioners use it as an excuse for not going for longevity and quality of their deliverables.

    So, with that of my chest, let me tell you that I really love what Scrum stands for. That whole thing about delivering something useful to who ever pays you for building it as soon as possible. Allowing for the customer to change his mind constantly about what is important and what not. And the very notion that sometimes you do something and than you have to redo it but differently is part of the deal is excellent.
    Scrum is awesome and it addresses a lot of very significant aspects of old-school project management. Especially when it comes to long analysis phases, and even longer design phases and excruciatingly long development phases, Scrum has introduced a lot of benefits.

    One of the main reasons why Scrum is such a success from an adoption perspective is the fact that it has been introduced in the limited scope of development project teams. And then in most cases those teams that had to face a lot of changes in requirements and priorities, namely front-end, user facing systems. Systems that directly addresses the needs of an end user.

    As the common enemy called "customer" united all developers, Scrum thrived.

    The rebellious attitude introduced with Scrum appeals to many developers, and I don't mean any disrespect to developers. I am a developer and often I am my own customer, or in Scrum terminology, I am my own product owner and I suffer from changing priorities and requirements all the time. It's part of usable software.

    In many organizations these days, development projects are done with the Scrum manifesto in one hand and a sincere lack of documentation in the other.Here lies already a big problem, namely the fact that in these organizations people, even Scrum zealots, still talk about projects. In and by itself it is impossible to do projects using Scrum. The very notion of projects is preposterous, but that'll be covered in another post sometime.
    These enterprises have introduced Scrum in their software development projects and those organizations that are a bit more mature have introduced DevOps, which is the second stage in Scrum-olution if you ask me.

    Havoc is wrecked when a Scrum team is developing software, functionality that depends on the deliverable of another Scrum team. How are sprints aligned? How are interfaces co-developed, how are release dates, even in continuous delivery situations, coordinated? Scrum, nor DevOps, can answer these questions. The reason for this is that the scope of Scrum was never intended to go beyond the activities of the Scrum team. Inter-scrum-communication is, well not existent.
    Of course it exists, because it happens all the time. Initiatives like meta-scrums, meta-sprints and meta-other-Scrum-buzzwords are popping up all over the place.
    The fact is that once you look beyond the scope of sprints and epics and what the Product Owner is looking at from a back log perspective, there's not a lot that you can't can do without the quality and talents of an architect. It is always the architect that delivers the bigger picture.

    Scrum is, I believe, something that stems from the sport rugby. And if not, than still for the purpose of this post, it does. And where the Scrum teams play the game, it is the architects that define the playing field as well as the rules by which to play. Mind you, Scrum does not equal developer's anarchy. It allows for the freedom of the team, if suitable, to apply a pragmatic approach to realizing the product owner's products. But the freedom is within the limitations set forth by the architect in terms of policies, principles and standards. Because of the architect, features integrate and systems can communicate. But more importantly, it is architecture that allows for cohesion and consistency between different applications and services, thus ensuring for example compliance to laws and regulations. Especially in environments that are predominantly based around SOA (WS* and REST both), it is imperative that policies and standards are defined and adhered to in a consistent and cohesive manner in order to be and stay compliant.
    This becomes very apparent when principles like:

    • "One Version of the Truth"
    • "Re-use of business logic"
    • "Role based Access Control"

    Are to be followed. Without an architecture framework, it becomes extremely expensive from a governance point of view for an organization to enforce or even ensure adherence.

    Do you need an architect in a Scrum team? Probably not as long as you've got one or two members in your team that are taking care of the relevant design work, whether or not implicit to the Sprints deliverables. But in the bigger picture, there is definite need for an architect, especially when you not only want your development teams to be agile, and capable of adjusting to the whims of the product owner but also your whole business to be agile and capable of adjusting to market demands.

    Thus, unless you want to keep Scrum to be Waterfall's baby brother for ever and ever, you need to mature your agility and that is only possible by applying architecture and involve architects. But not so much within Scrum teams, but across these teams.