This is an article about washing dishes.
This is also an article about DevOps, but mostly it’s about washing dishes.
Before any of that, however, we’re going to talk about a helpful concept called the U-Curve.
Introducing the U-Curve
If you have ever read Donald G. Reinertsen’s seminal book The Principles of Product Development Flow, you’ll already know what a U-Curve is, and you might have already figured out where we’re going to take this whole dish washing analogy. If you work in software and you haven’t read that book, put it high on your reading list.
So, what is a U-Curve? It’s a shape that appears when you combine the costs of not doing something (and having more of those things queue up as you wait) with the cost of actually doing that work. It’s used to find the optimal batch size for work items.
If you think all that is a mouthful, you’d be right! That’s why a real-world analogy and more pictures should make it clear.
The U-Curve for Washing Dishes by Hand
Let’s think about the cost of washing dirty dishes the old-fashioned way (and I promise you: this will soon all relate to software!).
To completely wash even one dish, we have to:
- Interrupt whatever we were doing and choose to wash a dish instead
- Get some hot water, dish soap, and a towel
- Scrub the dish with water and soap
- Rinse the dish clean
- Dry the dish with a towel
This set of tasks represents the transaction cost for cleaning dishes by hand. Whether you wash only one dish or every dish in your kitchen, the same tasks are necessary. The main difference is the number of dishes you choose to clean at a time, which is called the batch size.
Since it would take many minutes to wash a single dish the moment it gets dirty, it’s obviously inefficient to follow that routine. Instead, we usually let a few of them pile up in the sink. Sometime later, we will wash our pile of dirty dishes.
The above graph illustrates the cost of washing the dishes in the sink (the transaction cost, in red) and the holding cost (the cost of dishes piling up, in grey). The overall cost curve (in purple) is created by adding the transaction cost and holding cost values together, and it forms a “U” shape. The optimal batch size for washing dishes by hand is therefore where the “U” is at its lowest point.
The vertical grey strips on the graph above show the two extremes: on the left, washing a single dish at a time; on the right, only washing them rarely (say, once a week).
Optimizing Manual Dish Washing
If dishes are washed one at a time then the transaction cost is very high, since there’s a lot of preparation and work required for each washing.
If we wait until the end of the week to do anything, the transaction cost will drop substantially since we won’t spend an entire week washing dishes. However, a lot of dishes will pile up, and that will lead to a high holding cost (few clean dishes are available, the kitchen looks messy, there’s less free space on the counter for food preparation, etc.).
Humans are actually pretty good at unconsciously optimizing for overall cost in a system like this. They innately build routines around washing dishes that balance the transactional cost of washing them with the negative effects of letting a pile of dirty dishes sit in the kitchen sink.
The best that a human with a dish towel and soap in their hands can hope for, however, is to operate at the lowest point of the existing U-Curve. In other words, they can’t radically reduce their overall costs without fundamentally changing the system.
Manual Dish Washing and…um…Software
I promised that I’d tie this analogy to software and DevOps. Here’s how each dish concept translates to technology:
- Dirty dishes: new product features and fixes that have been coded but are not yet released
- Clean dishes: new product features and fixes that have been released and are in the hands of users
- Transaction cost: the process used to build a software release and deploy it to production
- Holding cost: the cost of not getting feedback from users on new features, maintaining unreleased (and potentially not very valuable) code, not knowing whether features will truly work in production, etc.
In the software world, manual dish washing is like releasing products using human effort and minimal automation. Triggering builds by hand; deploying infrastructure changes by logging directly onto servers and typing out commands; manually compiling ChangeLog entries, and so on: these are the software equivalents of “washing dishes manually.”
Why Queues Are Really, Really Evil
A large pile of dirty dishes represents a growing queue. In any system – be it computerized or human – queues are the enemies of healthy flow. Queues rapidly increase delivery delays and are early indicators of future performance problems. They also inhibit feedback loops.
Unfortunately, long queues also cause unproductive behaviors in humans. The longer a queue grows, the harder it becomes to reason about, and the more negative sentiments it can create. Cry the QA engineers tasked with testing a new version of their product to market after it has spent years in development, “I can’t keep track of all the changes we have to test in this huge release!”
Long queues cause humans to gradually fear the transactions required to reduce their length, which leads to the transactions being deferred, which leads to an ever-increasing queue. In dishwashing terms, an enormous pile of dirty dishes causes people to avoid spending the requisite hour or two to clean them all.
Long queues (or big piles of dishes) also tend to increase the ceremonial importance of transactions. As the fear of big transactions increases, we tend to build routines and traditions to accommodate the risk, time, and effort incurred by allowing those queues to grow large. We begin to serve the needs of the transaction rather than the needs of our users.
For example, just as people who habitually let lots of dishes pile up might nominate a weekend morning as their “weekly dish washing time,” a team with complicated manual release processes can easily end up creating a deliberately infrequent release schedule with even more human-based process and rulebooks, perpetuating their current situation and encoding it into their team’s culture.
With a fully manual process, humans tend to push further right on the U-Curve, running transactions less frequently, and leading to bigger and bigger batch sizes. When you hear phrases like, “Don’t start the big testing phase for another day because there’s one more bug fix we’d like to make first…,” then you are witnessing this phenomenon in action.
Introducing Automation: The Classic Dishwasher
Dishwashers are a helpful invention, but they’re not the ultimate optimization for our dirty dish woes. All that a dishwasher appliance does is make it more likely that the humans using it will consistently choose a batch size near the optimal point at the bottom of the U-Curve.
How does using a dishwasher make that happen?
- Fear of the transaction is drastically reduced. Unlike the dread caused by massive piles of dirty dishes, it requires the same effort for a human to run a full dishwasher compared to running one with a single dirty dish in it. This leads to it being run more frequently, which makes the batch size smaller, which reduces the overall cost.
- Transaction cost is obvious. Nobody is likely to run a dishwasher with one dish in it, nor are they likely to overfill it. It’s too convenient to just run it when it’s mostly full.
- While its transaction cost is high (several hours for a full cycle), humans are freed up while a dishwasher runs. This eliminates the opportunity cost associated with manual dish washing.
- Batch size is limited. You can only fit so many dishes into a dishwasher, and then you are forced to run it.
- Risk of breakages is reduced. You are more likely to drop and break dishes when washing by hand. A dishwasher greatly reduces that risk.
However, transactional ceremony is still a problem, even with a dishwasher. Personal routines get built around dishwashers that further cement their place in our lives and prevent any further optimizations. For many people, “starting the dishwasher after dinner” becomes an established technique that they’ll use for many years without thinking about it.
Dishwashers and Software
In software terms, introducing a dishwasher into a kitchen is like automating a few parts of a release process that were manual and prone to operator error, but keeping the same basic steps as before.
That leads to the frequency of product releases increasing enough so that the batch size becomes optimal, but the lines and curves on the graph stay the same. In other words, it’s like automating enough to make releases less scary, but without truly moving the optimization needle.
That’s where DevOps comes in.
Presenting: The DevOps Dishwasher
There’s a dishwashing utopia we can reach, where:
- Dishes never pile up
- Dishes are cleaned as soon as they are dirtied
- We never worry about the effort required to clean dishes
- We don’t need to optimize the number of dishes to be cleaned at a time
- Dishwashing becomes an invisible and automatic process that frees us to focus on other things
We can achieve this grand kitchen vision with a smart use of technology, and by focusing that technology on reliably delivering the thing that we value the most: a steady stream of clean dishes.
To do it, we build a new robotic dishwashing machine that operates very differently from traditional ones: it always has hot water and soap at the ready, it grabs dirty dishes as soon as they appear in the sink, it quickly cleans and dries them, and finally, it puts them away in their correct place on the shelf.
If lots of dirty dishes appear all at once, (bear with me while we go further into fantasy land…) a small army of our new robotic dishwashing machines magically appear, and they each grab a dish and clean the whole pile simultaneously. Dirty dishes never build up, and clean dishes are back on the shelves almost immediately.
By investing in this technology to change how dishes are cleaned, our transaction curve drops significantly, and therefore we radically change the shape of our dishwashing U-Curve to make its lowest (most optimal) point even lower, while also reducing the batch size needed to achieve those cost savings.
Building a DevOps Dishwasher
What is the software equivalent of this wonderful dishwashing nirvana? It’s comprised of a number of pieces: DevOps, CI/CD, pipelines, trunk-based development, value stream mapping, infrastructure-as-code, and more. It’s the set of things required to get changes made in your product codebase and into the hands of your users as soon as they are ready:
- A team that delivers small chunks of work very frequently and avoids long-lived feature branches
- A reliable, scalable Continuous Integration (CI) system that builds and tests any new code change automatically
- Automated tests that are fast, reliable, and have sufficient coverage
- A Continuous Delivery (CD) system that automatically gets new code running in production (perhaps using blue/green or canary rollouts)
- Infrastructure that is elastic, programmable, and reliable
- Teams of developers and operators who collaborate effectively and agree on how to keep the whole system – and the product it deploys – healthy and responsive
Regardless of the terms you may use to describe all of the above (let’s just group them all under the term “DevOps” for now), they are the software equivalent of our amazing robotic dishwasher. Used properly, they lead to tiny batch sizes, negligible queues, and a vastly reduced overall cost.
Deploying tiny changes frequently is also an excellent way to reduce the risk of breakages in production. How many bugs can hide inside 10 lines of code? How about 10,000? The probability of problems occurring increases dramatically as batch sizes grow, so adopting a high-frequency, small-batch deployment approach is a far safer model than trying to release many months of changes in one go.
Lastly, small batch changes and high-frequency deployments also happen to be a very accurate predictor of high-performing teams (as anyone who has read the outstanding book Accelerate will already know).
Go Forth and Build Those Dishwashers
Now that you’re armed with an understanding of the U-Curve and the economic relationships between batch size, transaction cost, holding cost, and risk, you might even start to see U-Curves appear in other areas of your life and work.
And maybe the next time you are in a team meeting deciding when to manually release a ton of features that have been queued up for weeks, you’ll imagine big piles of dirty dishes in a sink and start inventing amazing ways to wash them.
Many thanks to Ty Paulhus for his custom graphics for this article.
Brian Kelly is Head of Conjur Engineering at CyberArk, where he focuses on creating products that add much-needed security and access management to the landscape of DevOps tools and cloud systems. Brian is passionate about building teams, cybersecurity, and DevOps. Find him on Twitter at @brikelly.