Sprint, Crawl, Walk, Run: The Veterans Affairs’ Journey to AIOps
The Covid-19 pandemic forced a lot of organizations into IT changes they were not ready for. The most obvious reason for on-the-fly IT transformation was the need to support a massive and nearly overnight shift from on-premises production to a work-from-home model. According to the National Bureau of Economic Research, only about 5% of employees worked remotely before the pandemic, but by some estimates that number shot up to nearly two-thirds after governors across the country began instituting emergency measures to slow the spread of the disease.
The sudden strain on bandwidth was compounded by the need to manage new applications, devices, collaboration tools, and other resources. IT operations needed help and, out of desperate necessity, many enterprises green-lit unplanned projects to modernize their enterprises to meet demand.
The United States Department of Veterans Affairs (VA) was one such organization, but the VA is no ordinary enterprise. And their IT transformation was no ordinary journey.
Big and Complex
Dave Catanoso, Director, Enterprise Cloud Solutions Office at the VA, describes it as, “The second-largest federal agency, second only to the Department of Defense. If budget were revenue, we are a Fortune nine organization.”
From a technology perspective, the VA is as complex as it is big. Catanoso’s team is part of the VA’s Office of Information Technology which has 8000 full-time employees and an additional 8000 contractors.
“At any given time, we’ve got about 200 development projects in play, and manage over 1.5 million data elements,” he says “We’ve got a very vast IT infrastructure.”
All those people and all that technology have a mission to support a workforce of more than 480,000 people who serve the needs of more than 19 million U.S. service veterans and their families. That includes operating the largest healthcare organization in the country, including more than 1200 individual health care facilities and 56 regional benefits offices, and 155 national cemeteries.
A Mandate to Serve
“Our mandate is to provide the best possible care to all our veterans, and IT modernization is a critical part of that process,” Catanoso says. “Our enterprise cloud solutions office plays a key role in delivering the VA enterprise cloud.”
Managing the IT estate of an organization of that magnitude would be a challenge under the best of circumstances. But in the spring of 2020, things changed radically for the VA.
“We went from a mostly on-premises workforce, with maybe 60,000 [of 480,000] people working remotely, to 170,000 at the peak of the pandemic almost overnight,” he says. But because the healthcare mission had to continue, reliance on IT skyrocketed.
“Prior to the pandemic, we were doing approximately 25,000 telehealth visits a month. Today, we support well over 45,000 visits in a day,” Catanoso says.
Sprint, Crawl, Walk, Run
Catanoso says that, normally, the scope of the VA’s IT modernization would have been a years-long process. His team had to get it done in a matter of weeks, even while managing their own shift to a working model consisting of blended on-premises and remote teams. The task involved scaling up the on-premises VA Video Connect system while simultaneously standing up a matching environment in the cloud to ensure sufficient capacity. Instead of the usual “crawl, walk, run” approach, the VA had to start its transformational journey at a full sprint.
“Most consultants will tell you, when you start your cloud migration journey, start off small, do a couple of small applications” to build confidence and momentum, Catanoso says. “We went the exact opposite direction, moving one of our largest, most mission-critical systems to the cloud first, because we had compelling operational and business reasons to do that.”
And that was just the beginning. From that initial migration, Catanoso says the VA has already moved roughly 20% of a growing catalog of approximately 100 applications to the cloud, with as many as 50 to be completed by the end of 2024. The workhorse Veterans Benefits Management System (VBMS) was one of the first.
Billions and Billions of Documents
“VBMS is a web-based application for paperless claims processing. It’s the primary application of our benefits department. It is almost ten years old and has a tremendous amount of custom code. It supports roughly 4000 simultaneous users, and it manages millions and millions of documents, so you can imagine moving something like that is not easy,” Catanoso says. “We had to migrate 800 million documents from the external hosting provider up to the Amazon cloud. Today, it’s tracking approximately 2.2 billion documents, and it has to be available 24×7 for users across the U.S., and from Puerto Rico to the Philippines.”
Of course, all these changes created monitoring challenges that the VA’s legacy tools couldn’t handle, so they turned to the ScienceLogic SL1 platform (SL1). With SL1, the VA could see across a sprawling, hybrid environment in the midst of a major transformation and still keep pace with enterprise performance, enabling them to meet their goals of improving reliability, scalability, and performance, all while reducing costs.
Catanoso also says he wanted to make sure all the applications and services critical to serving 19 million veterans each day were available to employees and members. When problems arise, he says he wants the fastest possible time to resolution, including accelerating root-cause analysis—and automating the steps involved in incident response and maintenance as much as possible.
No More Coffee Breaks
Catanoso said gains in productivity and operational efficiency have allowed him to reallocate IT staff in ways that support the VA’s mission more directly, rather than spending time on repetitive tasks. Furthermore, he said the improved performance of the systems and applications used by employees has dramatically increased.
In one colorful example of the productivity improvements the VA achieved by using SL1 to diagnose and fix performance issues, Catanoso describes how some large documents were slow to download before AIOps.
“Pre-migration we had download speeds of two to three minutes, and even more for 100-megabit documents. But we were able to optimize that to get it down to sub-one-minute download speeds for those same size documents, dramatically increasing the performance of the claims adjusters that use that system,” he says. In fact, the speed was so much faster, “They needed to retrain them not to take a coffee break when the document was downloading,” Catanoso laughs.
Measure Twice, Cut Once
“Our enterprise IT infrastructure team basically pulled off what I would consider a technical miracle given how fast we were able to migrate those documents over and get it up and running without losing any data, and without any major downtime,” he adds.
With the VA’s initial cloud migration sprint over, Catanoso says they are back to the crawl, walk run approach, tackling the essentials of IT transformation with SL1 monitoring and managing behind it all, including:
- Tools consolidation into a single platform;
- Complete enterprise discovery; and
- Real-time CMDB population and synchronization.
“Now we’re starting our pivot, at the crawl phase, integrating ScienceLogic with ServiceNow, and our APM tools, and getting a data lake built,” he says. “Once we get into walk and run, we’ll tackle advanced event correlation, response automation, incident detection, and prediction. We want to take our time and get it right upfront. Measure twice, cut once. We’ve got a vast amount of data to collect, and systems to monitor.”
And, ScienceLogic will be there to support the VA just like the VA is there to support our nation’s veterans.