From Monitoring to Automation: How to Act on Your Data

Most ITOps teams have data collection and alerting in place, but with organizations pushing for more automation, acting on that data is the next big step.

Benjamin Leyland, Principal Solutions Architect, ScienceLogic

Automation is trending. From self-driving cars to drone delivery, technology is rapidly developing to replace humans. In the ITOps space, the rise of AIOps has brought significant focus on automation and how it can help enterprise IT departments and MSPs reduce costs.

When your CIO or CTO asks how you’re automating your ITOps practice, what do you say? There are many possible answers depending on the nature of your business. It could be transforming your helpdesk using robotic process automation to interpret and act on inbound requests from users. Or it might be orchestration and provisioning using common tools like Chef, Puppet, and Ansible. However, if you’re using a tool like the SL1 platform to collect performance, configuration, and event data from your IT infrastructure and applications, you can drive the following types of automation using that data lake:

  • • Automating incident response, including fault triage and automated remediation
  • • Integrating with ecosystem tools to streamline user workflows
  • • Transforming and contextualizing your data to make it easier to understand

Incident Response

What do your Tier 1 engineers do when they get a new incident ticket? Do they even get an incident automatically, or do they have to create it themselves? Automations related to incident response are intended to eliminate the repetitive tasks that engineer is performing, including:

  • • Creating incidents when faults are detected in IT infrastructure and applications
  • • Populating incidents with relevant information, like the affected device and its attributes
  • • Running diagnostic commands to help determine root cause
  • • Determining and executing remediation steps

Your monitoring data should be ready to drive these types of automations. The core competency of SL1 for many years has been the flexible data collection engine. SL1 can collect from a wide variety of sources using a wide variety of methods and also have the capability to extend quickly using the PowerPack toolset. With comprehensive monitoring in place, you will have a set of events that are ready to trigger incident automation.

AIOpsBut it doesn’t just end with the event data. An event can be used to create an incident, but all the data that went in to detecting that event is key to driving further automation. From an event in your monitoring system, automation can look up the affected device or monitored component. That might lead to an asset record, related devices, and access credentials for connecting to the device. By monitoring the device, you have amassed all the information you need to automatically respond to the incident. And we’ll dive deeper into the event-driven automation features of SL1 in future blog posts. 

Integrating Ecosystem Tools

As much as we would like one tool to rule them all, most ITOps environments have multiple toolsets. A given user will gravitate towards their favorite tool, whether it’s the change manager that works in ServiceNow or a security engineer who swears by Splunk. Automation to integrate your tools should eliminate the following:

  • • Data being transferred manually between toolsets
  • • Users accessing multiple tools to accomplish a single task
  • • Different users having different insights into the same data

Integrating your ITOps ecosystem should enable, for example, the engineer responding to an incident in PagerDuty and the analyst diving deep into historical SL1 data having the same context about a given event.

The tool you use and how you implement an integration often depends on the quantity of data and its direction of travel. For example, SL1 has multiple features that can be leveraged depending on the nature of the integration:

  • • The SL1 Run Book Automation capability is well-suited to sending data outbound from SL1 when events occur, e.g., for an incident integration or to trigger an external process when a monitoring change occurs.
  • • The SL1 Integration Service can be used to build bi-directional integrations, e.g., for data exchange with a CMDB like ServiceNow.
  • • An integration might require a significant amount of data to be transferred from SL1 to an external tool, e.g., for data analytics. The SL1 Publisher feature has been designed for this use case, where data is constantly streamed rather than pulled from SL1.
  • • If an integration is uni-directional inbound to ScienceLogic, e.g., from an orchestration or provisioning tool, the GraphQL and REST APIs would typically be used to ingest that data.

Data Transformation

If you’re an SL1 user, you’re probably taking advantage of data transformation and contextualization automation already. You might not even think of it as a type of automation at all. This type of automation is focused on providing a user who is looking at data the right context to do their job quickly. In other words, this automation replaces the manual task of searching, compiling, and combining data.

Some examples include:

  • • Aggregating data from multiple sources into a representation of a service that you provide to your customers
  • • Using AI and ML techniques to highlight anomalies or other interesting trends in data
  • • Applying a set of rules to group related events together, e.g., those related by a topology

Leveraging these types of data transformation tools can help other automated processes succeed. For example, if you have defined a business service that is updated automatically by SL1 as its composition changes, that service definition can be propagated to other ecosystem tools like ServiceNow. Likewise, by implementing rules-based of grouping events and ML-driven anomaly detection, your incident response automation can be more effective by reducing noise.

What should you automate first? As with self-driving cars and delivery drones, the goal is to replace humans who are performing a repetitive task. Find out what repetitive task your humans are doing most and automate that.

Find out how service provider, NetDesign, acted on their data with great success. Read Forrester’s Total Economic Impact™ of ScienceLogic SL1 for NetDesign»

 

X