A more human-centric approach to automation is the key to simplifying operations
This article was originally published on Forbes.com.
As of 2019, Deloitte found that 58% of the organizations they surveyed have already started using some form of technology to automate business processes. Oftentimes, automation is touted as a complete solution for managing the complex software stacks of today. However, automation can be difficult because it requires working around a variety of third-party systems and APIs, which are brittle, fragile, and complex. A further problem is that full automation — which aims to take the human out of the picture — requires a complete, nuanced understanding of a system and all potential outcomes, paradoxically resulting in heightened system complexity. When innovation is occurring at a high velocity, this understanding becomes a constantly moving target.
Instead of attempting to automate absolutely everything, teams should carefully identify which parts of their systems and processes are appropriate to automate. The human operator is an inextricable element of a functioning system, which means a more human-centric approach to automation is the key to simplifying operations across a variety of use cases, including DevOps and other engineering operations use cases.
Although "automate everything" is a popular ideology in modern DevOps and site reliability engineering (SRE), human judgment remains a key factor in every stack's management, from daily operations to incidents. Engineering teams today are faced with a real-world disparity between the theoretical promise of automation and the futility of implementing it wholesale in rapidly changing environments. Full automation is appealing because of its simplicity.However, automation alone is neither flexible nor expansive enough to keep up with a dynamically evolving technology stack.
According to a McKinsey study on the technical potential for automation in the U.S., the hardest activities to automate are those that apply human expertise to decision-making, planning, or creative work. These activities have an automation potential of only 18%. While automation excels with well-defined activities, it does not have the ability to adapt to change or coordinate with different systems and consider unexpected factors. You can't redirect or tell end-to-end automation to focus on something specific once it has started. We should let the machines do what they are good at — repeatable, predictable, mechanizable tasks — and let humans use their judgment at key decision points.
The goal of human-in-the-loop automation is to find a balance between human involvement and end-to-end automation. It readily allows for identifying goals, purposes, and risks that cannot be easily identified by automation alone. Human judgment is able to step in where the logic to automate would be so complex and so closely tied to the specifics of a fast-evolving product that it would need to be updated almost as frequently as the product itself to be useful. Therefore, letting the human command and control when to apply various automation makes automation overall easier to build out and more reliable. Humans continue to play an important role in the systems they are managing. Therefore, it is important to keep humans fully involved at significant decision points in the process to increase flexibility and stability while automating repetitive tasks. Net-net: Humans, not machines, should control the decision-making process.
The fusion of automation and human interaction is what brings value to a business. For example, in the DevOps use case, it improves, quality of life for on-call engineers when they're dealing with incident management. During the investigation process, augmenting incident response with a capability like interactive runbooks allows engineers to review the various levers that can be pulled, exposing different controls that are available to them. Built with human-in-the-loop automation, interactive runbooks incorporate past experiences and lessons learned from incidents to recommend actions to take immediately following an incident alert. In this way, automation not only serves as a guide during incidents, but it also eliminates the toil of repeatable, mundane tasks, and common remediation steps such as restarting a service or code commit. It surfaces useful data and insights more easily and improves the on-call experience by reducing cognitive overload and accelerating resolution times. Other examples of DevOps tasks that are ripe for automation include software installation, version control, provisioning servers, and user experience tests. Human-in-the-loop automation is also relevant for other use cases like data collection, infrastructure inspections, and analytics.
For DevOps and engineering operations, human-in-the-loop automation is all about making building and maintaining applications easier. Moving forward, it's important to maintain the dynamic, human understanding of a system or service’s components, history, and context. Automation can significantly accelerate product release timelines — bringing it down from years to months or weeks — and helps teams achieve consistency, accuracy, and reliability when building and delivering products.
Businesses are constantly looking for ways to improve efficiency with automation. Companies have mountains of data at their fingertips, and as they modernize their digital infrastructure and rapidly adopt emerging, data-centric applications and technologies, automation is critical to their future success. Yet while automation can augment human expertise and reduce time spent on repeatable tasks, it can't replace human judgment. As long as software continues to evolve, there will always be new areas of uncertainty that require the human operator.