The Dirty Data Problem: Why Modernizing Infrastructure Monitoring is Pivotal to AIOps Success
Jeff Dean at Google Brain once said that the most sophisticated AI algorithms succumb to the quality of the dataset they rely on. That’s a fancy way of saying: “Garbage in, garbage out.” And if your organization is struggling with the effects of dirty data—inaccurate analytics, sub-optimal automations, and persistent problems with IT operations management—chances are you’ve got visibility gaps in your infrastructure that have you operating with a CMDB filled with inaccurate, incomplete, or obsolete information.
In the latest ScienceLogic webinar, “The Dirty Data Problem: Why Modernizing Infra Monitoring is Pivotal to AIOps,” ScienceLogic product marketing manager Eric Hian-Cheong and VP of product marketing Leslie Minnix-Wolfe discuss why modernizing your IT infrastructure monitoring environment is pivotal to cleaning your dirty data—and foundational to your AIOps strategy. After all, you can’t do AIOps if you’ve got bad data.
Clean, Reliable Data Is Vital to IT Operations
The importance of good data to IT operations monitoring and management has been made clear over the last two years. In that time, driven by circumstances brought about by the COVID-19 pandemic, we’ve seen organizations around the world pursue digital transformation under the mantra digitize to survive. On average, digital transformation timelines accelerated by an average of three years ahead of expectations.
In fact, Microsoft CEO Satya Nadella said his company had seen two years’ worth of digital transformation activity in a two-month period last year. What does that mean, and what are organizations looking to address with digital transformation? There are three priorities for most enterprises engaged in or considering digital transformation:
- Embrace cloud computing
- Adopt private, public, and hybrid cloud
- Accelerate the move of workloads to the cloud
- Focus on customer and employee experiences
- Support the remote workforce
- Enable more self-service applications and services
- Supplement skilled IT resources
- Maximize employee efficacy and efficiency
- Maximize retention and satisfaction
- Augment capabilities through automation
These priorities often run parallel to the need to maintain investments in legacy infrastructure, placing additional strain on IT operations teams who are expected to keep pace with their organizations’ transformations and keep it all running at peak efficiency. That’s because more components and complexity increase the amount of data an enterprise’s IT infrastructure creates, and that increases the risk of working with dirty data. To meet the challenge, ITOps teams are turning to AIOps platforms like SL1 from ScienceLogic.
How ScienceLogic Does It
ScienceLogic gives IT operations real-time visibility across the entire IT estate, collecting and feeding data from all inputs into a real-time operational data lake, and giving that data context relative to events, performance, configuration, utilization, and relationships, making it more meaningful and actionable. That allows ITOps to consistently apply machine learning and analytics to understand the health, availability, and reliability of the estate, and to automate root cause analysis and other workflow automations like incident management, enrichment, and repair.
SL1 also enables the consolidation and modernization of the IT toolset, simplifying the view that IT ops has of operations, while supporting the creation of innovative new services based on the needs of customers and users by resolving problems caused by dirty data and shifting to a more service-centric posture. Ultimately, this allows organizations to invest in more sophisticated IT workflow automations to address efficiency and skills gap problems.
Three Case Studies
Greater efficiency, reliability, and performance improvements through automation sounds great, but where’s the proof? ScienceLogic is fortunate to be the trusted source of AIOps for many organizations that have gone down the digital transformation road. And these are not garden-variety organizations facing simple challenges, but some of the most daunting circumstances you might expect to encounter in IT operations.
Case Study #1, Dell/Pfizer: Pfizer was one of the key pharmaceutical companies engaged in Operation Warp Speed to develop an effective COVID-19 vaccine. They engaged with Dell Technologies for technology transformation a few years before the pandemic and were still in the process when the pandemic set in. In early 2020 Pfizer and Dell found themselves with the unprecedented challenge of completing Pfizer’s IT transformation while relying on that same infrastructure to support high-priority R&D while accommodating the collaboration needs of a distributed, global workforce.
And they had to do it while Dell was also shifting to a remote working model.
When Pfizer originally set out to modernize pre-COVID, their primary goals were to achieve a 50% IT operations cost reduction through consolidation and automation; and to increase technical agility to better respond to crises by creating secure, collaborative access to high-volume resources between internal staff, university research centers, and other public and private institutions.
Leveraging the SL1 platform, Dell and Pfizer established a foundation of clean data that was used to populate their CMDB and create a single, operational data lake. Then that data was used to simplify and automate processes, including over 70,000 hours of repetitive, manual tasks. And they eliminated an entire portfolio of legacy infrastructure monitoring tools by consolidating onto SL1. From there Pfizer pivoted completely to a remote workforce while standing up a brand-new secure data center in Singapore. That transformed IT estate allowed Pfizer to conduct over 40,000 clinical trials in record time.
Case Study #2, Capgemini IT: Capgemini IT operates 20 global data centers and cloud platforms, more than 35 sites, 350 offices, more than 300,000 endpoints, and more than 400 corporate applications. They have more than half a million assets, 10% of which are mission-critical. They chose SL1 as their AIOps platform for three reasons: reduce operational costs; deliver exceptional user experience, and rapidly respond to business needs.
Capgemini IT faced huge infrastructure visibility gaps because of siloed monitoring. In fact, they quickly discovered they could only see about 30% of their total IT estate. That meant Capgemini IT was contending with a lack of complete, accurate, and timely data—dirty data—making it impossible to manage their IT operations. They were running few automations, and the ones they were running were unreliable. Related, Capgemini IT was struggling with a high mean time to repair (MTTR) for roughly 50,000 incidents per year, including 40-45 major incidents per month. They were constantly in reaction mode.
Adopting SL1 for AIOps, Capgemini IT quickly established 100% visibility across their entire, distributed infrastructure, including mission-critical assets. That allowed them to establish a real-time CMBD populated with reliable data and use that data to create automations, including ticketing, troubleshooting, and remediations. Incidents are now down more than 3x, and MTTR rates for those remaining are a fraction of what they were. Many expensive and inefficient tools have been eliminated and they’ve integrated ScienceLogic with their ServiceNow service desk, setting the stage for ongoing improvements.
Case Study, #3, U.S. Department of Veterans Affairs: The VA is one of the largest U.S. federal agencies, serving approximately 9 million veterans and their dependents with a variety of services, including operating the country’s largest healthcare network. The COVID-19 pandemic put a huge burden on the VA—and its modernization efforts, forcing it to accelerate modernization plans to provide its members with more reliable services.
A surge in telehealth and other remote services put tremendous strain on an already beleaguered system. Pre-pandemic VA telehealth would typically handle 25,000 visits per month; but in April of 2020, demand rose to more than 40,000 visits per day. The VA’s goals were simple: speed delivery and increase reliability of remote services. Achieving that would not be simple—at least not without the right technical foundation. Compounding the challenge, the agency’s remote workforce shot up from 60,000 to 170,000 when the lockdown went into effect.
The VA adopted SL1 and quickly gained 100% service and infrastructure visibility, including Azure and AWS clouds, and VMware components across both IT and DevOps organizations. That allowed them to address the problem of an inaccurate CMDB, and provided support for the agency’s remote workforce. With SL1 the VA was also able to reduce tool dependency, while integrating with tools like ServiceNow, Dynatrace, and AppDynamics to support all the different services and applications that their members rely on.
For more details, register now for the webinar, “The Dirty Data Problem: Why Modernizing Infra Monitoring is Pivotal to AIOps.” And, as always, if you have any questions about how ScienceLogic and the SL1 platform can support your digital transformation, please get in touch.