These days, passengers have to be worried about more than one kind of crash when they travel. Over the past two months we’ve seen two major airlines, Delta and Southwest, experience huge computer system crashes that resulted in flight delays, frustrated passengers and revenue losses totaling in the billions.
The two computer crashes had unique origins — one was a power problem and the other was a router problem. However, it seems to me like Southwest and Delta are both guilty of the same thing: they weren’t ready.
While some were blaming local power utility companies for the Delta computer crash that left tens of thousands of customers stranded earlier this month, the airline’s representatives ultimately admitted that a power control module failure cut off power to the main computer network. And while there were backup systems in place, some critical backup systems didn’t kick in, resulting in halting instability.
Some experts speculated that this was the result of years of acquisitions complicating computing systems by creating patchwork systems that may or may not integrate with each other smoothly. However, this has been debunked as Delta spokeswoman Susan Hayes confirmed that Delta has always relied on a single computing system. No, this is just a question of unpreparedness.
There is an added problem: A lot of Delta’s computer systems are just old. Antiquated technology just can’t perform at the same rate as newer software and hardware, which could be why some of those systems didn’t start back up — they were just too old for the backup system to handle.
Now let’s talk about Southwest. According to the company, the crash was the result of a router failure. Did the router have a backup system in place? Yes, however the router only experienced a “partial failure,” meaning that the backup system was not alerted to start up.
The first thing that bothers me about this is the explanation. I understand that the backup system doesn’t get alerted unless the router experiences a complete failure. But why? Why isn’t there a backup router that can be used in case anything goes wrong with the router? Wouldn’t that make sense?
The other thing that bothers me was the reaction of Southwest CEO Gary Kelly, who equated the company’s delay causing router failure to a “once-in-a-thousand-year flood,” in that it was a partial failure of the router, something he said they could have never seen coming. Kelly also said that the partial failure “isn’t a drill you can run.” I don’t understand why this is a drill that you can’t run. If you can test a complete shutdown of the router, why can’t you test what happens when only part of it fails? Also, isn’t the point of a backup system or plan to be ready for the unexpected? I doubt they would regard a critical system failure in an airplane with the same sort of “well, it happens” attitude.
Could both of these airlines have taken steps to prevent this from happening? It’s possible that if they had run a complete test run of their backup systems, they would have known ahead of time which ones would start up correctly and which ones cause trouble. However, I’m not unsympathetic to the plight of those tasked with running an airline’s computer systems. The problem with the airline industry is that it never stops. It doesn’t seem like IT personnel don’t get the luxury of “off hours” when they can perform elaborate security and backup checks on their systems. People have to fly all day, every day.
But doesn’t something have to be done here? Computer systems are getting more and more complicated. Over three billion people fly per year. Can we just expect things to stay the same and for more computer crashes to occur? Will these airlines pony up the cash to replace legacy systems? Will they do that at the cost of forgoing upgrades to their planes? I don’t know about you, but I’d prefer a computer crash to the traditional kind when it comes to flying.
Software engineers, developers, testers and other experts: What should Southwest and Delta have done? What should they do now? I’m interested in your thoughts, so let me know with your comments or via email.