Mars Climate Orbiter - $327.6 million
Mars Climate Orbiter – went 310 million
miles to buy the farm
First launched in 1998, this small
spacecraft was intended to circle Mars, collecting data about what passes for
weather on that planet. It arrived there some 286 days into its mission and was
never heard from again. Was it a victim of the mysterious powers of the red
planet? No, its loss was entirely down to the Americans’ insistence on using
imperial measurements, and not metric ones. As such, the software written to
control the orbiters thrusters had been programmed with pound-seconds force
(lbf-s) instead of the specified metric units of Newtonseconds (N-s), resulting
in it being significantly closer to Mars than was healthy. Instead of entering
a stable orbit, it ploughed straight into the upper atmosphere and a was
promptly vaporised after its 310 million mile journey. The cost of developing
the probe, an associated lander, and manning the mission was $327.6 million,
with no appreciable scientific return on that investment.
BlackBerry Blackout – Incalculable Cost
Research In Motion (RIM) had carved itself
a very enviable slice of the smartphone market, especially among business users
who liked the immediacy of its ‘push’ communications system.
That was until its entire subscriber
network went down for four days due to a software issue in 2011. What became
very apparent during the outage was that RIM didn’t understand what the problem
was, or how to fix it, undermining its credibility severely.
The outage started in the BlackBerry
datacenter in Slough, but soon spread to the Middle East, Africa and eventually
made its way to Latin America, the US and Canada. In the end, three-quarters of
RIM’s 70 million users couldn’t communicate using the BlackBerry messaging
service or receive e-mail.
After days of complete silence, RIM
eventually offered the excuse of a ‘core switch failure’, which did little to
calm the anger of its customers. It’s never really elaborated on that, and in many
respects this was truly a communications failure that wasn’t hardware or
software based, but one purely between the company and its paying customers. In
business, credibility is everything, and this outage dented RIM in a way that
can’t easily be fixed.
By way of recompense, RIM offered all those
affected $100 of free applications on their phones, but the damage this did to
a company that was already struggling dwarfs this expenditure.
Mercedes M-Class-187,000 Cars Recalled
Cruise control isn’t a feature that many in
this country use, but in the United States it’s a major selling point for
anyone who travels long distances by car. As such, Mercedes-Benz put in its
2000-2004 M-Class SUV, but with a slight software twist.
It’s general norm with these systems that
touching the brake of accelerator disengages the cruise control, but due to a
bug on the M-Class this didn’t happen.
In the event of an accident, drivers
discovered that while they attempted to stop the car, the cruise control system
was attempting to maintain the same speed – a conflict that could only be
resolved by excessive braking force.
Realising that it would be responsible for
any injury or fatalities that this fault contributed to, Mercedes-Benz issued a
recall that encompassed 137,000 vehicles in the USA and another 50,000 in
Germany, the exact cost of which the company has never revealed.
Parole Software Glitch – 450 Dangerous Criminals Freed
450 Dangerous Criminals Freed because
The errors of Parole Software Glitch
As with any modern society, California
doesn’t have infinite jail space to house inmates, so it’s keen to return to
society those it feels are the least threat as quickly as possible. That was
the logic behind a parole system it developed, which would help it identify who
to release and when. Unfortunately, it had some pretty serious bugs and set
about releasing all manner of nefarious people who would normally be having a
prolonged stay at the big house.
Included in its recommendations, that the
state then followed, were 450 criminals classed as violent and dangerous, and a
further 1,000 who were incarcerated for drug-related and other less serious
crimes
These events came off the back of a Supreme
Court instruction to California’s prisons to reduce their inmate numbers by 33,000
over two years. The system wasn’t supposed to include offenders with violent
records, gang associations or sex criminals, but it decided they deserved a
break too.
AT&T – $60m In Lost Calls
AT&T
– $60m In Lost Calls
In early January 1990, 60,000 American’s
tried to make long distances calls using AT&T, only to find it wasn’t
something the provider could do. At the heart of the problem was 114
long-distance telecom switches, all of which refused to route calls.
The switches were designed to detect a
fault, put up a ‘do not disturb’ sign, hand their calls to the next switch and
reset. A code change to speed up the reset process altered the exact order of
events, spending two rather than one message, which hit the resetting server
just as it was rebooting, making it assume another fault, starting the cycle
again. As a result, all 114 switches created a cascade failure, as each
resetting switch overloaded the next.
The cost on the day was $60m in lost
revenue, the annoyance of its customers and it took a further hit by offering
33% off long-distance calls on Valentine’s day that year in an attempt to make
amends.
Final Thoughts
Software bugs are all around us, messing up
cash dispensers, bricking phones and generally making life more complicated
than it already is. But as long as humans are involved in coding systems,
they’ll be flawed, just like ourselves.
Thankfully, most of the programming
mistakes we make are an inconvenience, rather than commercially damaging and
life threatening. In those circumstances, the bigger mistake is not identifying
the greater danger of a single point of failure, rather than the error that’s
ultimately exposed as the culprit.