AIOps and the Financial Services Industry: Part Two
ScienceLogics’s Brian Amaro Sr. Director of Global Solutions & CMO Murali Nemani continue their conversation with Kalpesh Sharma, Director at Capgemini.
ScienceLogics’s Brian Amaro Sr. Director of Global Solutions and CMO Murali Nemani continue their conversation with Kalpesh Sharma, Director, Principal Enterprise Architect, at Capgemini.
During the course of the two-part conversation, which includes examples from the field, Brian, Murali, and Kalpesh address challenges common to the industry and how AIOps helps forward-thinking organizations overcome them. Here are highlights from the conversation. Watch the video for deeper insights.
In this session, we’re going to dig into how ScienceLogic and Capgemini are helping financial services customers address their pain points and the role of AIOps in their execution models.
Brian, give us your take on how we’re engaging with customers and how that’s of value to them.
A Service-Centric Approach to AIOps
If there’s one thing that really screams from the rooftops, “Hey! AIOps!” it starts with building out an authoritative data set and being able to establish an accurate data lake in your CMDB. If you don’t have accurate data, it just doesn’t make sense. But you also must establish a plan for tool consolidation to eliminate the confusion and costs associated with siloed tools so that you can break down data silos and infuse collaboration around a common data set.
It is imperative to prioritize and structure the data with context and become situationally aware of the entire environment. A plane doesn’t fly itself. You need a pilot, and that pilot needs to be situationally aware of the plane’s environment—even with all that automation.
Then there’s the concept of service-centric approach to AIOps. Help us understand that.
What is the solution?
The complexity of running applications and ecosystems of hyper conversion and hybrid cloud mixed, with that legacy infrastructure, is forcing CIOs into migration and AIOps strategies without a complete understanding of data. Identifying the cost to maintain IT for both legacy and future ephemeral, hybrid ecosystems, identifying capacity for peak and slow utilization, and of course, maintaining future availability and reducing outages are all priorities for a CIO.
Unfortunately, most CIOs that we’ve spoken with are looking for that easy button. I’ve heard several times, “Brian, can you just tell me how you did it at Kellogg?” But it doesn’t work that way. IT has too many overlapping silos in each organization focusing on specific overall, device problems, rather than overall service health. Visibility into the overall service health does not exist for most organizations, unfortunately.
We hear, “It’s the network team’s fault,” and the network team says, “It’s not our fault.” There’s always that back and forth. Any CIO in the world can relate. The result of this bickering is inconsistency in identifying service health, increased meantime to remediate or resolve an issue, decreased operational efficiency, and root cause analysis that is seldom completed effectively.
Instead, we hear that “root cause analysis” is merely a rebooted server or restarted service. That isn’t analysis. There’s a problem that occurred, but did you do it right? The analysis of a problem, most of the time ends up being defined and thrown into that noise buckets. That’s the absolute wrong thing to do. Security risks increase as the data is not aligned properly due to a failure to address underlying problems.
Root cause analysis is the best guess in most organizations. Very seldom does anybody know the capacity or inventory of a dependent application service or even a location. And I can give a couple of examples from financial institutions we’ve worked with.
We’ve asked one CIO, “Can you tell us exactly what your inventory is, or what data center infrastructure is responsible for each one of your locations?” He chuckled and said, “Can you tell me how many organizations I have first?”
It’s a problem we see around the world. It isn’t just one CIO in one domain; it’s a problem for companies all around the world. And now we’re seeing companies “throwing cloud” at their problems. But the cloud becomes an added expense rather than a positive cashflow migration because there is no visibility as to how much you actually need.
You need an understanding of the estate, what’s in it, and the organizations involved in it. Otherwise, you’re just scratching the surface. How are you and Kalpesh advising and navigating this conundrum?
ScienceLogic and Capgemini help CIOs strategize a service-centric approach to AIOps by establishing visibility as the key to every AIOps foundational strategy. The architectural framework for AIOps adoption consists of the following elements:
- Data Collection – With complete visibility and data ingestion from every configuration item in the enterprise;
- Data Transformation – To ensure all data is confirmed, cleaned, enriched, and rendered in a single, consistent data model;
- Data Storage – All data is stored in a single, operational data lake;
- Analytics – Advanced machine learning employed to extract maximal value from trustworthy data;
- Self-Healing – Automation and analytics combined to give context to data, and to understand and remediate routine events based on root-cause analysis;
- Full Observability – To ensure all relationships and conditions are in view, understood, and acted on quickly and appropriately; and,
- Single Pane of Glass – To simplify the IT operations toolkit, streamline IT operations processes, and reduce cost and complexity.
Establishing a solid foundation provides knowledge and knowledge allows your people to do more and to think proactively. Establishing a practice around anomaly detection and behavioral correlation to find root cause faster drives improvements in mean-time-to-repair.
We’ve seen this a lot in insurance organizations, that we know must adapt and respond to tragedies and disasters. They have to be quick on their feet, and they rely on IT, so we have to eliminate that toil and manual repetitive work. We do that by enabling automated workflows like triaging, ping checks, DNS checks, traceroutes, restarting a service, restarting a server, and clearing log files.
That’s where machine learning comes in. If you don’t have an authoritative dataset and common data model, and instead rely on manual data silos that you can’t apply context to, you can have the best algorithms in the world, and they’re not going to do jack.
When you start by establishing integrity in your data set and then build service maps, you can do event ticket enrichment automation to suppress or completely eliminate the noise that plagues every operation. And in a proper, maturity-led crawl-walk-run approach, the part is true anomaly detection and behavioral correlation. Using machine learning you can know when something is predictive and not anomalous, despite being within the thresholds of operational constraints.
So many of our customers want that anomaly detection; they want predictive capabilities, but there’s so much you have to put in place in order to get to that point. All the things you just outlined are the foundation for applying sophisticated machine learning algorithms and AI.
Kalpesh, Capgemini thinks in terms of turnkey systems and solutions. While Brian is talking about particular capabilities and toolsets that are catalysts for that, tell us about how you’re approaching these problems and what solutions you’re taking to your customers.
ScienceLogic is a great tool for doing all the things Brian explained. Capgemini takes into consideration an enterprise’s needs and provides an end-to-end solution. What does that mean? We have created a working framework and working architecture, covering all aspects of data collection, data transformation, data storage, single pane of glass, AIOps analytics through ScienceLogic, self-healing, and full observability—all the pieces needed to address the pain points we’re hearing from CIOs.
We have taken all those inputs and created eight accelerators covering all those points. When we talk about data collection or data transformation, we are not just talking about monitoring. It is about data. It is also logging data. It is also data across your application platform and infrastructure across on-prem and cloud. Across every system and every piece of your enterprise?
These are tools, and people need to be able to operate and work on them, so we have a strong pool of trained architects and engineers knowledgeable of and experienced with implementing AIOps based solutions. We have been working on these problems for a good amount of time, and we have successfully implemented AIOps in multiple financial service institutions.
All About Results
The final—and probably the most important—question that everyone wants to know is, “What outcomes can I really expect?” Brian, based on your experience, what are the outcomes that customers can expect, and what are we seeing?
This is the fun part. This is the part I enjoy the most, and where my passion shows. When we see a successful implementation of the ScienceLogic SL1 platform, we see a tremendous value route and enhanced user experience, it’s satisfying to know that your solution did something like that, but there’s more.
We see Capgemini utilizing our platform to accelerate the delivery of what they can do for enterprise customers, reducing the complexity in onboarding new customers. There isn’t a huge manual process anymore, and that saves time and money. Most organizations quickly realize a number of key benefits to operating under a new AIOps model, including:
- Enhanced end-user experience by understanding the impact of an application or service to the company, and by isolating health and risk degradations that influence the availability;
- Accelerated delivery of business features by reducing the complexity in onboarding new customers or devices, and automatically aligning to respective business services to inherit established algorithms and behavioral correlations;
- Reduced operational cost by eliminating manual, repetitive work and increasing operational efficiency so your people can do high-level, high-value tasks, better using their skills and increasing employee satisfaction;
- Increased reliability, resiliency, and stability of applications and underlying systems by establishing a practice of being more proactive to symptoms in the environment rather than only reacting when something is broken.
Kalpesh, I’m going to put you on the hot seat, and ask you the hard question: what are the quantifiable ways that one can measure the impact?
The question we always hear is, “Tell me how AIOps will help me with the challenges in my environment.” And the good news is, we have examples from many different successful implementations of AIOps in financial institutions around the world. These quantifiable results include:
- Up to 40% reduction in mean-time-to-detect and mean-time-to-repair;
- Up to 99.99% service level objective for critical services;
- Less than 5% change failure rate;
- Up to 90% noise reduction; and,
- 30% manual effort reduction.
Of course, these numbers may not be repeatable across all deployments, but they are based on results from environments where AIOps has been successfully implemented.
A lot of these metrics may seem aspirational, but it’s actually happening. As you said, it all depends on the exact environment, but the framework you’ve laid out for how you go about doing this both in terms of the approach and the particular solution elements of it, I think is the structural sort of thinking if I want those outcomes. And it’s not just technology. It’s process, it’s people, it’s training, and it’s all those other things.
Gentlemen, I really want to thank you. This has been incredibly enlightening. I hope it’s been of value to our customers and we look forward to doing much more in AIOps.