Head of IT Operations at Liberty
This story originally appeared on Upshot.
Users expect a seamless experience when accessing accounts online, which means IT professionals work around the clock to deliver on those expectations. With the help of ScienceLogic, Liberty has implemented a proactive business monitoring process that has resulted in significant cost savings, a reduction of downtime and increased customer service.
Users expect technology to be fast and easy, which means safeguards are put in place to provide a seamless customer experience. In a perfect world, users are not aware of the complexities that go on behind the scene in order to deliver a secure and flawless experience. So how do large IT operations disguise the complex mesh of clouds, data and applications to achieve this simplistic illusion? It all starts with streamlining your resources and making smart partnerships and investments.
I am the head of IT Operations at Liberty, which is a South African based financial services company. We put a great deal of focus on stability, performance and availability of our production systems to our customers. As IT operations go, we are not a massive team, but Liberty as an organization has about 5,000 staff in actual office space. Plus, we have brokers, agents and franchises that sell our business across the country. When it comes to the IT space, we have a couple hundred people in the infrastructure domain.
Want a seamless user experience? It all starts with smart IT partnerships
IT Operations is split between five domains, one of which is enterprise monitoring. Enterprise monitoring encompasses configuration information, infrastructure monitoring, network performance monitoring, application performance monitoring and deep dive diagnostics at the code level.
In 2007, we started a journey to centralise the monitoring function. The discipline that we had several years ago was a heavy investment product and that investment in tools required people to manage and maintain it, too. Needless to say, getting coverage and visibility from a monitoring point of view was quite difficult. We did not have a huge budget, so we had to invest over several years and it took us about six or seven years to get to the point where we reached a level of maturity that was deemed to be mediocre at best.
As our existing monitoring function became unsustainable, we faced higher maintenance costs on an annual basis. These costs included paying licensing fees to get the support that we needed from the multiple providers and we needed to keep investing in training in order for us to grow our capability. We would have had to invest significantly in expanding our footprint on licenses and our skillset.
Investing in Capability
At that point, we knew we desperately needed a ‘managed services’ provider that could give us everything we required. So we started looking into the South African market. Our biggest driving force was around cost and capability. About a year after we started evaluating what was out there, we came across AppCentrix. Being based in South Africa, they were able to supply everything we needed, which included addressing our concerns relating to cost and sustainability.
“With ScienceLogic, our IT incidents have decreased by 98%”
In 2014, we officially partnered with AppCentrix and bought the capability instead of buying tools. We now have a strong business case that will save us a lot of revenue over the coming years. Our value proposition therefore strengthens because we get the capability right up front resulting in immediate benefit in the expense of the capability. Plus, the skills to deliver that capability come from the ‘managed services’ provisioning.
At the center of all this, was the ability to match our monitoring model that AppCentrix was able to provide via their tool, ScienceLogic. In a short span of time, we’ve managed to catch up to where we already were in our monitoring capability and superseded that. ScienceLogic has given us a solid step in the right direction from an infrastructure monitoring capability. It also delivers beyond expectations in our service space, with an event management discipline that now empowers us to be proactive when it comes to monitoring.
In addition to the cost saving, the occurrence of overall incidents has declined by 98 percent over the past four years. While monitoring is not the sole reason for this improvement, it is a major factor.
Enterprise Monitoring Model
The model that we now use is tiered and hits on all levels of enterprise monitoring, which includes the following six levels:
- Configuration Management Database
- At the base, we have our universal configuration database. This is the lowest level of configuration detail that you can get and where we store our details around configuration. The other benefit that we have derived is the interdependency mapping. It allows us to see the impact that instituted changes or incidents can have on our system, which helps with event correlation.
- At this level, infrastructure talks to servers, applications or services running on boxes. Those communications and more fall under infrastructure.
- Network Performance Monitoring
- This is where the physical wires connect everything together, such a bandwidth utilisation, top talker and application performance over a wide area. It encompasses everything that has to do with network performance.
- Application Performance Monitoring
- We split this into two areas. The first is application centric monitoring, so it shows you from an architecture perspective how the user interfaces with an application, which helps with problem isolation. The second area allows us to map business functions. We can now have insight into how the business processes are being performed by our user community.
- Deep Dive Diagnostics
- This level allows for code level interrogation. We can dive into the code and isolate inefficiencies within makeup and figure out how to address those issues with the application teams.
- Visual Dashboards
- All of this is capped off with our visual layer that includes our ScienceLogic dashboards and user interfaces.
The National Operations Center
We have also set up a National Operations Center operates around the clock to conduct all monitoring. The center is equipped with a video-wall made up of 23 screens, which is powered by ScienceLogic’s dashboard. It is an enormous part of our event management process. The entire room is tied to our system and the room changes color based on triggered events. We use this operation center as a central dashboard for technology as part of what ScienceLogic has enabled.
Evidence of Change
The following metrics for success are split between occurrences of severity rating 1 and 2, and running from 2013 through 2016. I think you’ll agree that they’re hugely impressive and more or less speak for themselves.
- Incident numbers fell from 156 to just 27
- Business hours of downtime dropped dramatically, from 1055 to 140
- Incidents have dropped from 46 per year, to one
- Network downtime has fallen from 456 hours, to five. That’s an improvement of a staggering 98.9 percent
Taken together, that’s a massive 97.6 days of processing time that went back into the business. We even set a record, going 290 days without a Severity 1 Incident. Those are valuable hours that we retained, instead of losing to various IT related issues. I feel comfortable saying that the monitoring ability that ScienceLogic gave us accounts for a part of that.
Armed with ScienceLogic, we can act in an instant, saving time and resources
In fact, monitoring is a discipline and should be acknowledged as such. The skills required to drive it are general specialist skills. The guys working in the space need to have an understanding of all of the platforms that we are running and be multi-skilled. In fact, I have been extremely selective of how I built my team. I hired people who understand the monitoring space. Generalists with in-depth knowledge of multiple disciplines are hard to come by and become invaluable to organisations. Those are the individuals who make up our monitoring team, which is a significant part of why we are so successful. Our team, armed with the monitoring capability from ScienceLogic, can act in an instant when an event is triggered and save us valuable time and precious resources.
For example, South Africa has somewhat unreliable and scarce resource such as water (which we use for cooling) and power. We are continuously taking proactive steps; one of which was to move our data center into micro Pods, which are self-contained ecosystems. Due to the nature of the environment that we work in, visibility through monitoring is absolutely essential. If we don’t have coverage in our facilities and do not have visibility around our power and water supply, we would be flying blind. While everything is automated, the clock starts ticking once we lose a resource. Plus, all of these things that we have to do are under significant cost pressures.
Liberty’s Strategy 2020 target in the IT Operations space is to double customer satisfaction at two-thirds of the cost. Partnering with a managing services provider that utilises tools such as ScienceLogic’s has decreased our cost significantly and increased our capability and visibility.