A look at our longtime, trusted friend in the spectrum of automation
"Automate all the things!" "You should automate that." "Automate the toil away." These are just a few of the things I've heard regularly in the past when it comes to dealing with an increasing amount of toil that we experience while operating software. What does automation mean when it comes to practice? And what exactly is automation in 2020? It's a spectrum!
So, what is the spectrum? In this five-part blog post series, I will cover different parts of the DevOps Spectrum of Automation. In this inaugural post, I want to focus on scripts – the most common version of automation that most developers are familiar with.
A few months ago, I chatted with Matt Stratton from RedHat, and one of his early automation stories stuck with me. The year was 1999, right before Y2K. At the last minute, he got the idea to write a script that would automate the shutdown process for the whole organization. He saved it onto a floppy disk and removed the script from his computer. He remembers "being very paranoid about that disk because [he] was like, this is like the big red button of this company." 8 pm rolled around on New Year's Eve, and it was time for his little script to shine. He ran it, and at 8:15 pm, everything was shut down, and they could go home.
This was not what they thought it would be like though. Most of his coworkers, including himself, thought they would be busy all night. Thanks to his script, they weren't. Looking back on it, Matt thinks about how "it sounds really funny 20 years later to be like, duh, why would you not do that? But it was revolutionary for the team to consider… we could write a script to do that." At the time, automation just wasn't a part of how IT thought about operations tooling.
The first interactive shells were created in the 1960s, but it wasn't until the late 1990s that sysadmins started to incorporate automation scripts into their toolbelt. Now that writing scripts has been around for decades now. Developers often throw around this word "script" a lot. "Oh, just write a script for it!" So, what is a script?
That's a very broad question. A script automates a set of tasks that could have otherwise been executed one-by-one by a human operator.
For this blog post, let's focus on scripts we use while operating software. For example, think of pipelines, installation, and shell scripting. There are other types of scripts used in development. Most commonly, client-side scripting with JavaScript, which powers a whole other world of front-end development by embedding code into HTML. We are not talking about these.
Scripts should embrace the Unix philosophy of smaller tools that do one thing well, rather than a tool that does everything and the "kitchen sink." It's better to write smaller, specialized scripts than one giant script that does a bunch of different tasks. Smaller scripts are easier to follow, maintain, and prevent script users from misunderstandings and mistakes.
Scripts need to be easy to follow, including how their decision trees work. Sometimes bad automation is worse than no automation. This will be an idea we will often visit in this series. In the case of scripts, why is that? If scripts don't react to failures well and leave things in a half changed state, it can be harder to recover from failure. If scripts don't check the state of the resources it is operating before running, it can also be harder to recover from failure. Scripts have to be adaptable.
In an incident, if we run a script with these issues that we don't understand well, it can cause more harm than if nothing had been done at all. And in the case of more automation, if a script was automatically run during an incident, the human operator may not have an understanding of the system state when they are paged. This can be costly not to understand what automation is doing in the middle of an incident.
Though it's not enough to understand what a script does, scripts need to be checked into a source control management tool, like git, and versioned. If we want scripts to be maintainable and easy to share within teams, this should be a requirement. How many times have you heard of a script that was only stored on a random teammate's computer?
Centralized repositories allow more than just the script author to learn about and contribute to them too. If changes need to be made to a script based on insights from an incident or routine maintenance task, we need a way to propose changes, create new versions of a script, and populate it to the rest of the team.
If you need more than a command sequence and don't need Bash utilities, Bash might not be the answer. There are now many scripting languages in 2020. It is important to find the right language for the job instead of trying to make everything in one language. Some teams might find Python as their chosen language for some scripts and Bash for others. Maybe Perl is the right choice instead.
The more important message is to use the right tool for the automation your team is doing and don't try to smash a square peg in a round hole.
While scripts are more than just manual steps you follow, like a manual runbook or checklist, a script is often manually triggered. Unlike automation that uses external services, machine learning, and artificial intelligence, with scripts, it is easier to understand what the decision tree is. Scripts are not a black box with hidden away code. You can trace back through them to see how it got to the result.
Image from my Failover Conf talk on "Human-in-the-Loop DevOps"
This is why scripts sit between runbooks and orchestration, which is much more than a single task and harder to trace back through, in the DevOps Spectrum of Automation. (Watch out for the next post in this series on orchestration.) While we wouldn't use scripts to orchestrate, they have their place for appropriately sized tasks that can be safely automated.
Lastly, I want to end with the idea of do-nothing scripts. A do-nothing script encodes the instructions of a step into code. Each step is a function. When you run a do-nothing script, it walks a user through a set of steps. These types of scripts do not automate the steps themselves, but instead guides a script user in what to do. It is a form of gradual automation because it is adding a thin layer of automation on top of a very manual process.
Why would I bring up do-nothing scripts? They are not the traditional type of script that many of us are used to. Since we are trying to look at traditional end-to-end automation differently in this series, I think it is an interesting idea to explore. On the spectrum, do-nothing scripts sit between runbooks and scripts. Later in the series, we will talk about where the idea of human-in-the-loop fits into the spectrum, which do-nothing scripts are a great example.
Does orchestration belong on the spectrum? Is it really automation? Watch out for the next part of this series on orchestration.
I would love to hear your thoughts on these questions! Tweet at me at @taylor_atx. We'll explore these ideas in the next part of the series.
P.S. Shout out to J. Paul Reed for giving talks that started the wheels in my brain on this topic.