What a crazy day. I was looking forward to grabbing dinner with a friend from college, only for him to send me a panicked message saying that his plane wouldn’t be flying today. I was so confused because the weather in Boston was fine, and I couldn’t fathom why they had rescheduled the flight for the next day. That was until I did a quick Google search.
Computer systems across various industries were forced into a system update that completely messed up operations. Many have labeled today’s event “the world’s largest IT outage.” Here’s the TLDR of what happened:
CrowdStrike, a cybersecurity firm, released a buggy security update for its Falcon platform. This is not out of the norm because security systems have routine updates to prevent cyberattacks. However, this morning’s update showed us that small mistakes can have huge consequences. The system update caused widespread issues on Windows systems. As a key defense for multiple industries, CrowdStrike’s update led to disruptions across aviation, finance, healthcare, and the public sector. Among these, the outages in the airline industry showed the world how a few lines of faulty code can bring global operations to a standstill.
Even though I wasn’t stranded at the airport, this whole fiasco got me thinking about how fragile the systems we rely on daily are. Technology is deeply woven into every aspect of our lives, and yet the magic of it all is much more delicate than we realize. Today’s IT outage reminds us that disruption is close and dangerous if there isn’t a backup plan or “security net” to fall into.
What was supposed to be a routine maintenance update went awry due to an error meant to provide security. I dare not imagine what disaster may ensue from a purposeful attack to compromise security. There are vulnerabilities in systems we take for granted, and more so than ever, security and responsible use of technology are priorities.
So, what can we do to prevent things like this from happening again? I’m sure CISOs around the world are deliberating this question.
I’m no CISO, but these are some of my passing ideas:
- Have a contingency plan: I think the disruption we experienced today is due to being complacent with existing systems. When the system was down, there wasn’t a fallback system or backup generator that kicked in as an emergency measure. While having a secondary security system on for insurance 24/7 may be extremely expensive, there should still be a way to revert to an older security version while errors are being resolved behind the scenes. This is a means to protect security and operations during the chaos. While it may not be fail-safe, it may soften the blow we witnessed today.
- Testing and Phased Release: As a disclaimer, I’m not confident about the ethics of a phased release for security, but if there are phased clinical trials in medicine, I think phased releases of software updates would be a great way to mitigate the risks of system failures. I’m not sure what the release process at CrowdStrike is, but I believe that the engineers could have quickly identified the errors in the update if testing protocols had been taken.
- Training and Communication: As our last line of defense, when systems are compromised, we should ensure that those working on the front lines have the knowledge and skill to communicate the situation and adopt measures to mitigate loss.
Preventative actions are essential, but knowing how to navigate a crisis is also important. Today, we got a wake-up call. Staying vigilant and avoiding complacency will be key to preventing similar crises in the future.