AWS re:Invent 2022 Recap: What Happened In Vegas …

Three key takeaways from re:Invent from our perspective:

Number one, data. It’s all about data. There’s so much data coming into the system. Amazon has created a lot of new services and processes to help with that. This is also something that we feel true about. This is something that we feel context is key. I hear context all the time. Customers want to be able to sift through that data, apply machine learning and AI in order to provide more business insights from that data.

Number two, automation. Everybody wants to do more with less. We’re seeing that with the partner ecosystem. We’re seeing that with the AWS services. Anytime that you can automate a process, anytime you can drive out and drive efficiencies, your customers are going to benefit. And we hear that a lot when talking to our customers.

Number three, ecosystem, ecosystem, ecosystem. It’s all about partnering. It’s working together. There’s so much complexity out there, but there’s also so many good things to do, all the ISV communities working together. We heard that in Ruba Borno’s (Vice President, Worldwide Channels & Alliances at Cisco) keynote. We heard that in Adam Selipsky’s (CEO of Amazon Web Service) keynote. We know that we got to work together to deliver excellent outcomes and customer value, and that’s something we’re here to do together as a team.

During AWS re:Invent, ScienceLogic’s CPO Michael Nappi and Zebrium’s CEO Ajay Singh sat down with the Cube’s Savannah Peterson and John Furrier to discuss ScienceLogic’s recent acquisition of the machine-learning analytics provider, Zebrium.

Savannah Peterson:

Zebrium was recently acquired by ScienceLogic. Mike, can you tell us a little bit about that and what it means for the company?

Michael Nappi:

ScienceLogic, as you may know, has been in the monitoring space for almost 20 years. And what we’ve seen is a shift from monitoring infrastructure to monitoring these increasingly complex modern cloud native applications. This is part of a journey that we’ve been on at ScienceLogic to really modernize how enterprises of all sizes manage their IT estate. Managing now workloads that are increasingly in the public cloud, outside the four walls of the enterprise, workloads that are increasingly complex, they’re microservices based, they’re container based. And the rate of change, just because of things like CICD and agile development has also increased the complexity in the typical IT environment. All these things have conspired to make the traditional tools and processes of managing IT and IT applications much more difficult. They just don’t scale. One of the things that we’ve recently seen is this shift in sort of moving to cloud native applications.

Today it only incorporates about roughly 25% of the typical IT portfolio, but most of the projections we’ve seen indicate that that’s going to invert in about three years. 75% of applications will be what I call cloud native. And this really requires different technologies to understand what’s going on with those applications. Zebrium interested us when we were looking at partners at the beginning of this year, as they have a super innovative approach to understanding really what’s going on with any cloud native application. And they separate the complexity out of the equation, and they use machine learning to tremendous effect to rapidly understand the root cause of an application failure. We’re thrilled to have Zebrium now part of the ScienceLogic family.

Savannah Peterson:

Ajay, Zebrium saves people a lot of time, obviously. I’ve worked with developers and seen that struggle when things break, shortening that time to recovery and understanding is so critical. Can you tell us a little bit about what’s under the hood and how the ML works to make that happen?

Ajay Singh:

The goal is to figure out, not just that something went wrong, but what went wrong. And we took, based on a couple of decades of experience from my co-founders—some general learnings about the nature of software. And when software breaks, what tends to happen, you tend to see unusual things happen and they lead to bad things happening. Very simple. It turns out to be-

Savannah Peterson:

Yes. Mutations lead to bad things happening, generally speaking.

Ajay Singh:

Exactly. What Zebrium’s really good at is identifying those rare things accurately and then figuring out how they connect or correlate to the bad things, the errors, the warnings, the alerts. The machine learning has many stages to it, but at its heart it’s classifying the event catalog of any application stack, figuring out what’s rare. And when things start to break, it’s telling you this cluster of events is both unusual and unlikely to be random, and it’s very likely the root cause report for the problem you’re trying to solve. We then added some nice enhancements such as correlation with knowledge bases in the public internet. If someone’s ever solved that problem before, we’re able to find a match and pull that back into our platform. But at the heart, it was a technology that can find rare events and find the connections with other events.

John Furrier:

And this is the theme of re:Invent this year: data, the role of data solving end-to-end complexities, one, you mentioned that. Two, I think, Mike, your point about developers and the CICD pipeline is where DevOps is. That is what IT now is. If you take digital transformation to its conclusion or its path and continue it, IT is DevOps. The developers are doing the IT in their coding, hence the shift to autonomous IT. Now, those other functions that IT used to be a department, not anymore, or they still are, but they’ll go away, is security and data teams. You’re starting to see the formation of new replacements to IT as a function to support the developers who are building the applications that will be the company. Do you agree with that statement?

Michael Nappi:

Yes. Collectively, independent of whether it’s traditional IT or it’s DevOps, the enterprise as a whole needs to understand how the infrastructure is deployed, the health of that infrastructure, and more importantly, the applications that are hosted in the infrastructure. How are they doing? What’s the health? And what we are seeing and what we’re trying to facilitate at ScienceLogic is really change the lens of IT from being low-level compute, storage, and networking to looking at everything through a services lens, looking at the services being delivered by IT back to the business and understanding things through a services lens. And Zebrium really complements that mission that we’ve been on by providing, because in a lot of cases they can provide that kind of real-time view of service health in kind of the IT state.

John Furrier:

And automation is beautiful there too, because as you get into some of the scale, Ajay, understanding how to do this fast is a key component?

Ajay Singh:

Yes. Scale, you’ve pinpointed one of the dimensions that makes AI really important when it comes to troubleshooting. Humans just can’t scale as fast as data, nor can they keep up with complexity or modern applications. And the third element that we feel is really important is the velocity with which people are now rolling out changes. People develop new features within hours, push them out to production. And in a world like that, the human has just no ability or time to understand what’s normal, what’s bad to update their alert rules. And you need a machine or an AI technology to go help you with that. And that’s basically what we’re about.

Savannah Peterson:

This is where AIOps comes in perfectly.

Michael Nappi:

John started to allude to it earlier, but having the insight on what’s going on, we believe is only half of the equation. Right? Once you understand what’s going on, you naturally want to take action to remediate it or optimize it. And we believe automation should not be an exercise that’s left to the reader as a lot of traditional platforms have done. Instead, we have a very robust, low-code / no-code, automation built into our platform that allows you to act in context with what you’re seeing right then and there with the service.

John Furrier:

You’ve got to track stuff at scale, and you’ve got to understand what the impact is from a systems perspective, but there’s consequences to understanding what goes wrong. As you look at that, what’s the challenge for customers to do that? Because that seems to be the hard part as they lift and shift to the cloud, run their apps on the cloud, now they’ve got to go take it to the next level, which is more developer velocity, faster productivity, and secure. How are companies forming around that? Are they there yet, or are they halfway there? Where are they in the progression?

Michael Nappi:

I think whether it’s an IT use case or a security use case, you can’t manage what you don’t know about. Visibility, discoverability, and understanding what’s going on—those are the really hard problems to solve. And traditionally we’ve approached that by harvesting data off of all these machines and devices in the infrastructure. But as we’ve seen with Zebrium and with related machine-learning technologies, there’s multiple ways of gaining insight as to what’s going on. Once you have the insight, be it an IT issue like a service outage or a security vulnerability, then you can take action. And the idea is you want to make that action as seamless as possible. But I think to answer your question, John, enterprises are still kind of getting their heads around, how can we break down all the silos that have built up over the last decade or two internally and get visibility across the estate that really matters? And I think that’s the real challenge.

Savannah Peterson:

What’s next for you? How are you going to help people solve problems faster?

Ajay Singh:

One of the attractions to the Zebrium team about ScienceLogic, aside from the team and the culture, was the product portfolio was so complementary. As Mike mentioned, you need visibility, you need mapping from low-level building blocks to business services. At the end of the spectrum, once you know something’s wrong, you need to be able to take action automatically. And again, ScienceLogic has a very strong set of product capabilities and automated actions. What we bring to the table is the middle layer, which is from visibility, understanding what went wrong, figuring out the root cause. To us, it was really exciting to be a very nice tuck-in into this broader platform where we help complete the story.

Learn more about ScienceLogic & Zebrium»

AWS re:Invent 2022 Recap: What Happened In Vegas …

ScienceLogic Editorial Team