Thoughts on Automation Since Failover Conf
Back in April, when a fully remote conference still felt novel, our community engineer, Taylor Barnett, gave a talk at Gremlin’s Failover Conf. The topic: Human-in-the-loop automation in DevOps. As a team, we were reticent about discussing a seemingly contradictory opinion to the popular “automate everything” mantra that gets thrown around so often in the DevOps/SRE world, but the concept is one of Transposit’s core tenets: the human operator is an inextricable element of a working system. We believe that ignoring the human only makes automation harder and more fragile. But as she prepared her talk, we still braced for a storm of detractors.
Instead, a steady flow of people reached out to share their support of the idea based on their own real-world experiences. Attendees mentioned techniques like auto-scaling and other forms of end-to-end automaton that can break down in incidents. It turned out that building automation around the human operator wasn’t a “concession” against the ultimate shangri-la goal of total automation; it was actually a mechanism for introducing more automation to the stack, but in a lower risk, more sustainable way than adding piecemeal scripts to runbooks on Confluence or the like.
When our CTO, Tina Huang chatted with Niall Murphy, the co-author of the Google SRE books a couple weeks ago, the role of humans in a working system once again bubbled up to the surface. And so, as we dig deeper into the Spectrum of Automation and prepare for a "Quick Bite" at Datadog's Dash conference in August, it feels like a good time to revisit that initial talk.
What do you think about Human-in-the-loop automation? How have you strategically planned around the human operator in your automation strategy? Let us know @Transposit.