How to automate the "every time there's an incident" tasks and jump straight to investigation and classification
Incident management is one of the biggest hurdles facing modern engineering teams. Legacy IT service management (ITSM) solutions are ill-equipped to handle modern speed and turnaround expectations. Oftentimes, incident reports can come through so many different channels or incident monitoring platforms that it is difficult to set automation parameters to handle them. Teams often also lack the resources necessary to automate processes and build in-house platforms tailored to their specific needs and infrastructure.
Transposit provides a modern approach to incident management, with connected workflow that integrates and automates incident management across people, platforms, processes, and APIs. Transposit helps teams accelerate response by surfacing knowledge and context, bringing the right people and teams together, and reducing toil through human-in-the-loop automation.
We break incident management down into five steps.
The first step — intake — alone can take upwards of 15 to 30 minutes per incident, which can lead to DevOps teams struggling to keep up with their manual daily operations while responding to incidents in a timely manner. Transposit integrates the tasks that need to take place every time there’s an incident, using automated runbooks and integrations to more than 200 pre-built connectors to every tool in an organization’s entire cloud stack.
When an incident is reported, there are a number of things that need to happen in order to begin the process of getting it fully taken care of — and oftentimes, these steps are entirely manual. In a traditional DevOps environment, things start with an alert of some kind, either through an observability platform such as New Relic or Datadog, or reported by customer service or customers directly.
When this happens, it is generally then up to the incident manager to get the process started. This usually starts with the incident manager manually creating and assigning a ticket in Jira or Zendesk or any number of other similar platforms. The incident manager then will have to pull up the incident process in a wiki page, create and manually add all necessary stakeholders to a Slack channel, and create a Zoom bridge and initiate a Zoom call to get all of the stakeholders onto the same page and to begin delegating tasks. This is a tedious process that eats away at time.
Transposit has the capability to turn that whole lift into an instant, automated process. Each incident has its own unique, nuanced needs in terms of response and remediation, and therefore still requires humans in the loop at various stages. But the intake phase is full of repetitive protocol that follows the same actions each time — creating a Jira ticket, logging a PagerDuty incident, inviting stakeholders to a Slack channel, scheduling a Zoom meeting, and/or any number of other organization-specific tasks.
By making use of Transposit’s automated runbooks and their pre-built connectors to more than 200 productivity tools, the entire process takes place the instant an incident is detected. Before anyone notices that an incident has been reported, the automated runbook handles the intake.
The process of setting up the automation begins by creating and setting up an incident runbook in the Transposit platform.
Set up runbook triggers
To fully automate incident intake, we need to begin by adding triggers to the runbook so it will automatically execute based on the criteria set. The triggering signals coming in from multiple monitoring and observability tools and channels can be optimized using Transposit webhooks, which integrate with tools like Datadog, Pagerduty, and BigPanda to instantly kick off the runbooks and set the intake process and the corresponding workflow in motion.
Note that runbooks can also be triggered based on the creation of an activity type, an activity update (i.e. a severity has been set or changed), or if a runbook's state has changed (in progress, closed, or error).
Add automated actions to "When runbook starts" section
Now we can add each task that needs to be automated in the runbook category labeled When runbook starts. Every action in this section will automatically execute upon the runbook running. Actions (like creating a Slack channel) or Conditions (like creating a waterfall task based on a previous action’s completion) can be created and customized with just a few clicks. In no time, the automated intake process is in place and ready to streamline the management of the incident. Learn more about creating and running runbooks here.
Every automated action is recorded in the timeline, so your team has a full audit trail of what has happened.
Intake is just the first step in a Transposit’s 5-stage Incident Management Solution, which is followed by Classification, Engagement, Remediation, and Report, Record, and Learn. Continue on to learn how to automate incident classification.