Arc-E-Tect: Blame prevention through devops

This post is a follow up of my previous post The question that takes away all blame.
Blameless postmortems, or blameless RCA’s are supposed to be the new-normal in devops organisations, but all too often we see that first the team and sometimes the person to blame is sought, and then we tell them to ‘fix it’.

You might’ve noticed that I wrote devops in all lower case in this post's title. I did that on purpose.

Devops sanse capital 'd'

Even though you typically see DevOps instead of devops. I think that DevOps implies that it’s about Development and Operations engineers working together. In my opinion, devops is about combining the responsibility of the development team and the responsibility of operations team, turning them into the responsibility of a single team. This would make devops a matter responsibility and less of an organisational concern.

The organisational aspect would then be in the form of ‘Product Teams’, responsible for a product with a Product Owner that is accountable for that product.
Something for another day. This article is about blameless post-mortems and root cause analysis driven through devops-tinted glasses.
Blameless post-mortems are something that comes more natural in environments of shared responsibilities. Environments where the same people are responsible for both the quality of the product as well its usage. Environments where devops is considered the combined responsibility of development and operations within a single team.

Silos vs Accountability

I have observed in a number of organisations that one of the main reasons these organisations are considering the move towards devops is based around the concept of shared responsibility. It is the idea that silos prevent this sharing of responsibility. It is a misconception though. Silos don't prevent shared responsibility, although culturally they'll probably inhibit the sharing of responsibilities. What the real problem is, is the lack of accountability in a siloed organisation. Or quite the opposite; Too many persons are accountable for different/conflicting objectives.

In a siloed organisation, each silo is primarily responsible and even accountable for its own output, its immediate contribution to the process of product delivery, but not the full process of delivery itself. Meaning that when the outcome of the process is of the unwanted kind (caused an incident), either one of the silos’ outputs caused the problem (who is to blame?) and when there is no single silo to be blamed, nobody can be held accountable.

This can go as far that a sales team is responsible for selling a product. A signed contract is considered a success. The development team is responsible for changing the product. The release of the change into production is considered a success. The operations team is responsible for 'running' the product and it is considered to be successful when there are no incidents. This would be for SaaS vendors. For more traditional software companies, i.e. those that require implementations of a product at the customer site, the operations team is part of the customer's organisation or at least the operations accountability is typically with the customer. It'll be more complicated, because there will likely be an implementation team that is successful when the product is implemented according to the contract sold.
Success of the product is defined as selling/changing/operating/implementing. With different persons accountable for each of these successes, you see that conflicts are imminent. So when a problem happens anywhere in the delivery, each of the accountable persons will elaborate that they're not to blame, because they are successful. Actually, sales and development were successful, and operations and implementation were given something that prevented them from being successful. Contract was signed based on the availability of missing features at the time the implementation project reached completion. Future releases are feature complete and functionality fully tested. But it's unmanageable, not performant and definitely not secure. And can't be implemented as integrations with other systems not available until project end.

Siloed organisations are structured around tasks, competencies and expertise. By centralising capabilities, they can be shared across products. Siloed organisations are build on shared service centres. Reason behind these structures is cost reduction through utilisation optimisation. I'm not a fan, see: Perish or Survive, or being Efficient vs being Effective.

Output vs Outcome

It’s the difference between output and outcome that often drives ‘blaming’ in a post-mortem.

In siloed organisations, each silo’s focus is on output, its output. The silos are in many cases the result of centralising the responsibility for specific aspects of the delivery process, with a lack of accountability for the full process. Specialists are responsible for doing their 'thing' is efficient as possible.

Often responsibility is mistaken for accountability, so these task-optimised teams, teams of experts, are held accountable for what they deliver, which is not the outcome of the process, but output of their effort. Because they are the experts, they perform their task for different products, i.e. they participate in various delivery processes. And are held accountable for the number of tasks that they completed in total.

The problem therefore is in that the silos are operating truly independent of each other. Each silo services several product delivery processes. Because servicing only one (product delivery) process, would mean that a significant amount of time the silo would be idle. Since the silos are there to optimise resource utilisation, idle time is undesired. Idle time is considered wasted time by many non-Lean'ers. In Lean wasted time is time spend on something that is not immediately needed for the delivery of a product.

Every silo will work hard to meet its numbers. Meet its targets. And when the target is the number of tasks performed instead of the number of products delivered, we're doing a lot and contributing nothing.

Within a context of blameless post-mortems. In a context where blaming should be prevented, we need to make sure that responsibility is shared on the (product delivery) process outcome. Accountability is set to manage that outcome. Meaning that development and operational responsibilities are both defined to contribute to the outcome of the process. Something devops shines at.

Thanks once again for reading my blog. Please don't be reluctant to Tweet about it, put a link on Facebook or recommend this blog to your network on LinkedIn. Heck, send the link of my blog to all your Whatsapp friends and everybody in your contact-list. But if you really want to show your appreciation, drop a comment with your opinion on the topic, your experiences or anything else that is relevant.

Arc-E-Tect

The text very explicitly communicates my own personal views, experiences and practices. Any similarities with the views, experiences and practices of any of my previous or current clients, customers or employers are strictly coincidental. This post is therefore my own, and I am the sole author of it and am the sole copyright holder of it.

Arc-E-Tect

Translate

March 28, 2019

Blame prevention through devops

Devops sanse capital 'd'

Silos vs Accountability

Output vs Outcome

No comments:

Post a Comment