News

After CrowdStrike outage, what will become of automatic IT updates?

July 31, 2024

The crisis highlighted tech organizations’ penchant for blindly trusting vendor software updates. It’s the latest symptom of a race toward IT automation.

Businesses around the world screeched to a halt on July 19 after millions of Windows computers crashed, flashing the dreaded blue screen of death. The culprit: a software update in security vendor CrowdStrike’s platform, which led to hours of disruption for some — and days for others.

The IT outage grounded airlines across the country and crashed banking apps, showing the tangible effects a bad piece of code can have in any IT-dependent operation.

In the aftermath of the CrowdStrike outage, analysts stressed to CIOs and other tech leaders the need for closer scrutiny over automatic software updates. Part of the problem is a steady push toward broader IT automation and the industry’s disproportionate reliance on centralized vendor updates.

“There’s been an almost crazed drive towards automation of SaaS over the last five years,” said Phil Fersht, CEO and chief analyst at HFS Research.

“This is a big, big wake up call to the whole IT industry’s overreliance on a blind trust that everything’s just going to be upgraded,” said Fersht. “Fairly small code issues can cause massive ramifications as we’ve just seen.”

The faulty update that triggered CrowdStrike’s global outage was live for little more than one hour, but automated updates amplified its reach. Fortune 500 companies grapple with financial losses from the outage, surpassing $5.4 billion, according to one estimate

“I think it’s just been a little bit of complacency,” Fersht said. “Too much trust in big tech that, as long as we buy Microsoft, for example, everything’s going to be fine.”

Automation frustration

The CrowdStrike outage showed the consequences of a critical software failure landing in widely adopted solutions by way of automatic updates.  

Automating IT updates slowly grew in popularity, driven by the availability of package-manager utilities in Unix and later Linux, said Charles Betz, research director at Forrester. Centralized patch management soon emerged for laptop fleet management, and then Microsoft’s move to cloud-based solutions like Microsoft 365 provided a watershed moment.

“I think the convenience has been seductive,” Betz said. 

Without proper quality assurance mechanisms, blind trust in automated vendor updates can lead to widespread issues in critical systems.

“Automation replicates a result quickly and consistently without regard for the virtue of that result,” said John Annand, research director at Info-Tech Research, in an email. “A bad change propagates just as fast as a good change.”

IT downtime, no matter the root cause, can impact operations and fuel customer frustration. It also comes with a high sticker price: downtime costs U.S. businesses more than $400 billion per year, according to Splunk data.

It’s up to technology leaders to stave off the effects of future faulty software updates by putting safeguards in place, according to Annand. As IT organizations look to AI and automation, the outage highlighted the need for internal checks and balances, Annand said.

Preventing the next big one

Analysts have flagged the importance of risk mitigation techniques such as canary deployment — preliminary rollouts under controlled conditions prior to broader deployments. 

Following the outage, CrowdStrike announced it is taking steps to restore customer confidence, such as adding additional validation testing and releasing new updates through a staggered deployment strategy. 

“Quality assurance and regression testing are critical,” said Jen Kling, Microsoft global partnership director at TEKsystems. “You cannot just blindly trust whatever updates are being pushed out.”

Executives must consider whether critical systems and applications should adhere immediately to release cycles or if it’s pertinent to delay updates until their operational safety can be confirmed.

“When you do your business continuity and your disaster recovery planning, you have to think about how quickly you accept those” updates, said Kling. “There were a lot of companies that immediately accepted what was pushed out.”

In the post-CrowdStrike world, a shift in perspective is already underway.

“There’s going to be, and already is, a greater level of stringency around testing upgrades,” Fersht said. “You can do digital twin models, more and better use of synthetic data, better testing of these things before they occur.”

View the article

Logo

confirm

Congratulations!

Your account has been created. You can continue exploring free AI insights while you verify your email. Please check your inbox for the verification link to activate full access.

Sign In

Sign up for a free
research account

With the exception of our Horizons reports, most of our research is available for free on our website. Sign up for a free account and start realizing the power of insights now.

Digests/Newsletters: Overviews of the latest news, insight, and research by HFS.

HFS Events: Exclusive invitations to HFS webinars, roundtables, and summits, bringing together key industry stakeholders focused on major innovations impacting business operations.

By registering you agree to our privacy policy.

I hereby consent that HFS Research can process my personal data.

Premium Access

Our premium subscription gives enterprise clients access to our complete library of proprietary research, direct access to our industry analysts, and other benefits.

Contact us at ask@hfsresearch.com for more information on premium access.

Help

If you are looking for help getting in touch with someone from HFS, please click the chat button to the bottom right of your screen to start a conversation with a member of our team.

ask@hfsresearch.com

ASK
HFS AI