On Friday, July 19th, 2024, IT teams across businesses faced massive disruptions due to a global outage involving CrowdStrike and Microsoft Windows. The outage stemmed from a logic flaw in CrowdStrike Falcon sensor version 7.11 and above, which caused Windows systems to crash.
What Caused the Outage?
The issue was traced to a flawed sensor configuration update (channel file 291) in CrowdStrike Falcon. This update, meant to improve the evaluation of named pipe execution on Windows, introduced a logic error that led to Falcon sensor crashes and subsequent Windows system crashes (BSOD).
Which Critical Services Were Affected?
Device Impact:
- Affected Devices: Approximately 8.5 million Windows devices, less than 1% of the global Windows install base. Despite the small percentage, critical operations were severely impacted.
Affected Services:
- Airlines and Airports: Thousands of flights grounded globally, affecting airlines like Delta, United, and American Airlines, and airports such as Toronto Pearson and Amsterdam Schiphol.
- Public Transit: Disruptions in cities like Chicago, New York City, and Washington, D.C.
- Healthcare: Appointment systems and 911 services in states like Alaska and Indiana were disrupted.
- Financial Services: Online banking systems and payment platforms experienced outages, delaying pay-checks.
- Media and Broadcasting: Outlets like Sky News were taken off the air.
This outage highlights the vulnerability of critical infrastructure to IT flaws and the importance of robust crisis management strategies.
Key Takeaways from the CrowdStrike Outage for IT Teams
Despite remediation efforts from Microsoft and CrowdStrike, recovery continued until Monday, July 22nd for many businesses, including major airlines and hospital systems.
The disruption underscored the necessity for CIOs to have robust outage response plans. Jon Amato, a senior director analyst at Gartner, described this as a major stress test for IT support teams.
Actionable Steps for CIOs:
- Reassess Preparedness: Evaluate the company's readiness for major outages.
- Test Crisis Scenarios: Conduct simulations to prepare for potential IT crises.
- Develop Business Continuity Plans: Create strategies to maintain operations during disruptions.
Financial and Trust Impacts: Organizations lose approximately $400 billion annually due to IT failures and unplanned downtime, with outages significantly degrading customer trust.
Future Preparedness: As businesses recover, CIOs should enhance their preparedness by running simulation exercises and developing effective response strategies for future IT crises.
How Can Businesses Be Better Prepared for IT Outages Using HCL BigFix?
HCLSoftware’s commitment to innovation and excellence in advanced cybersecurity can provide businesses with the confidence and security they need to recover quickly from such uncertain situations.
Our AI Digital+ endpoint management platform, HCL BigFix enables complete visibility and control for IT Operations and Security teams. HCL BigFix automates endpoint security through discovery, management and real-time remediation of all endpoints – whether on-premise, virtual or in the cloud – regardless of operating system, location or connectivity. Comprehensive, trusted endpoint management is a critical part of emergency preparedness and business continuity. With HCL BigFix, businesses are able to rapidly identify and remediate impacted systems at scale.
Conclusion
Incidents like CrowdStrike outage illustrates that any business can face significant IT disruptions. Cybersecurity is an essential aspect of business continuity and needs to be carefully guarded as such. It’s of the utmost importance that system administrators continue to trust their software update mechanisms and patching to ensure continued and effective security. HCL BigFix ensures your business remains resilient and operational, even during IT crises.
Start a Conversation with Us
We’re here to help you find the right solutions and support you in achieving your business goals.