“Rocket science is tough, and rockets have a way of failing.”
~Sally Ride, NASA astronaut and first American woman in space
Someone once asked people on Twitter to reveal the first big news event that they remember hearing about as a child. It was a way for people to give a clue about their age.
The event would qualify if you remembered hearing a lot about it, and if you understood a little about what was happening even though you were just a kid.
Thousands participated in the fun exercise, and some patterns emerged.
People born in the late 1950s or early 1960s answered the 1969 human Moon landing. Those born in the late 1970s or early 1980s answered the 1989 fall of the Berlin wall. The Kennedy assassination, Nixon’s resignation following Watergate, and 9/11, were also frequently mentioned by people of different generations.
I had no doubt about my own answer: The 1986 Challenger explosion. It was the top answer for those born in the mid-1970s (The 1986 Chernobyl disaster was also mentioned a lot). The tragic images on TV of the shuttle exploding in mid-air became etched in my memory.
So naturally I was very interested in the Netflix series Challenger: The Final Flight. I highly recommend it.
After watching the series, you really get a sense that this was an avoidable disaster. It didn’t have to be that way. In summary, the incident happened because the launch was delayed, and it was very cold the night before the launch (for Florida weather). There was a serious risk that the O-ring seals of the solid rocket boosters could fail in extreme cold temperatures. It was a known risk, and one of the O-rings did fail, resulting in pressurized burning gas escaping from the rocket booster and igniting the external fuel tank.
The Netflix series on the Challenger disaster includes fascinating interviews with key individuals. There are five important risk management lessons that we can learn from the incident and from these interviews. They’re summarized below (without spoiling the series in case you want to watch it).
1) Risks Must be Taken to Succeed
NASA suffered some tragic incidents throughout its existence, including the 1967 Apollo 1 fire, the 1986 Challenger explosion, the 2003 Columbia disaster, and a few others. And yet, despite these tragedies, the American space program remains the most successful in history. That’s not a coincidence.
In the Netflix series, we learn that NASA was perfectly aware of risks associated with a reusable Space Shuttle. They sought to minimize risks, but they knew that they could not eliminate them entirely. NASA willfully decided that, in order to succeed, they needed to take risks.
Officials interviewed in the Netflix series were not trivializing the tragedies and fatalities, but rather highlighting a lesson that applies to everyone: You can’t live or move forward without risk. Being 100% risk-averse leads to stagnation and failure. The key is to calculate risks and manage the trade-offs (i.e. how much risk is an organization willing to accept for a given amount of success?). NASA knew that there were risks with the Space Shuttle, but they accepted them as the price to pay to succeed. People can agree or disagree whether the price was too high.
2) Aggressive Schedules and Goals Increase Risks
The Netflix series does not hold back. Mistakes are clearly exposed and explained, without any excuses made. And as you watch the series, it can be tempting to pick a scapegoat for the disaster. But one of the persons interviewed said something like this about a NASA official: “He’s not a bad guy, he was just put in a bad situation.”
NASA had very aggressive targets for Space Shuttle flights in 1986 and beyond, and the launch of Challenger was already delayed significantly. It was scheduled to launch on January 22, but was delayed to January 25, and then rescheduled for January 28 because of bad weather. As a result, NASA officials were under immense pressure to launch Challenger. When organizations set aggressive, or even unrealistic, schedules and goals, it puts employees in a difficult situation where they’re more likely to make mistakes.
Direct factors can increase risks of incidents: physical hazards, employee behavior, equipment malfunction, biological or chemical agents, inefficient processes, etc. But there are also indirect factors that should be looked at, such as an organizational environment set by Management that increases pressure and stress on employees. This can lead to a situation where even a well-intentioned and competent professional could make a mistake.
3) Sometimes Middle Management Knows Better
The Solid Rocket Boosters (SRBs) of the Space Shuttle were manufactured by Thiokol. As a contractor to NASA, Thiokol executives were also under heavy pressure.
Thiokol engineers and managers knew that low temperatures could affect the O-rings of the SRBs, causing a risk of explosion. On January 27, the day before the launch, they discussed weather conditions with NASA during a first conference call. Many Thiokol engineers expressed concern about the effect of low temperatures on the O-rings. They even recommended that the launch be delayed again.
Thiokol executives initially supported the recommendation of their own engineers and managers. But they were severely criticized by NASA officials. A second conference call then took place, but this time without Thiokol engineers. Only Thiokol executives and NASA officials were present. Thiokol executives overruled their own engineers and recommended that the launch take place, much to the satisfaction of NASA. Later that night, a Thiokol engineer told his wife that Challenger would explode.
The lesson here is that middle management and engineers are closer to operations, and therefore have better knowledge about potential risks. Upper management should defer to their judgment instead of overruling them regarding critical safety issues. It’s normal that executives take risks to meet aggressive schedules or business objectives, but they must also know when to follow the recommendations of middle management, especially when lives are at stake.
4) Trust a Contractor’s Recommendation
This section is related to the previous one. As seen above, Thiokol’s initial recommendation was to delay the launch because of concerns around the impact of low temperatures on the SRBs’ O-rings. That was not what NASA wanted to hear. NASA officials then put heavy pressure on Thiokol executives to change their recommendation. The rest is history.
Ironically, it’s the reverse situation in many industries (e.g. oil and gas, construction, manufacturing). The contracting organization puts pressure on a contractor to improve its safety practices and raise its safety standards up to its level. But in the case of the Challenger disaster, NASA de facto put pressure on Thiokol to reduce its safety standards, which the company did by recommending to go ahead with the launch after reversing an initial recommendation to delay it.
If a contractor is more conservative, prudent, and risk-averse, it’s best to follow their recommendation because they know the product or service that they’re delivering much better than the contracting organization.
5) Zero-Risk is a Journey, Not a Destination
It’s impossible to be perfect. But aiming for perfection is possible. It’s the same with risks of incidents. Organizations can’t completely eliminate risks, but they can at least aim to continuously move towards zero-risk. If you think that there is no big difference between 98.5% perfect and 99% perfect, talk to the family of an employee who died because of an accident that made up the 0.5% difference.
NASA did a thorough investigation after the Challenger disaster. The SRBs were redesigned to eliminate the problem with the O-rings, and many lessons were learned, both technical lessons and organizational ones regarding communications, decision-making, management judgements, etc.
NASA’s Space Shuttles were grounded for more than two years until September 29, 1988 when Discovery was launched. Unfortunately, disaster struck again when the Space Shuttle Columbia disintegrated upon reentering the atmosphere on February 1, 2003. An investigation also took place, and many organizational and technical changes were made.
The takeaway is that, despite comprehensive investigations where root causes are identified and corrective and preventive actions are taken, risks of incidents can never be completely eliminated. It’s impossible to be perfect and bring risks down to zero. For example, even if a technology is flawless, there will always be a risk of human error. Rather, the proper mindset to adopt is that each incident creates an opportunity to get better and further reduce risks.
That’s what happened at NASA. Changes made after each incident did not completely eliminate risks of future incidents. But lessons were learned after each event, which helped to further reduce risks. Space Shuttle flights continued after the Columbia disaster until the final flight performed by Atlantis, which launched on July 8, 2011. A total of 135 flights took place between 1981 and 2011, of which 133 safely returned to Earth, or 98.5%.
Incidents are never pleasant. But at least they provide opportunities to learn and improve. And it’s important to capitalize on these opportunities for the sake of all employees, including those who lost their lives.
On-Demand Webinar: Making the Most of Remote Audits
Watch this webinar with Golder and Enhesa to learn more about the evolution of audit programs and best practices to make your remote audits successful. Learn about developments in the near future regarding audit program planning, and long-term audit trends.