Accident investigations often return a result of ‘human error.’ However, this finding is really just the tip of the iceberg in understanding why people operating in complex systems such as nuclear power plants did what they did. Real risk reduction requires getting to the bottom of human performance issues. By Ken Ellis
The Costa Concordia was an ultra-modern cruise liner with satellite and radar navigation systems. In 2012, it also had a bridge full of qualified people as it sailed according to a predetermined course that was signed off before the ship left port. It had two sets of navigation maps: 1:100,000 and 1:20,000 maps specifically for shallow waters. And there were set procedures; at certain shallow depths, the ship’s speed was limited. With all of these layers of defence-in-depth, it was unthinkable that such a ship could run aground. But she did, on 13 January 2012.
The point of impact was well away from the liner’s prearranged course. It was travelling at three times the maximum speed for the shallow water depth.
The problem wasn’t the ship. Unlike the elements of a complicated system, people don’t follow the laws of physics, or thermodynamics; they can be unpredictable. It turns out that the dangerous practice of buzzing close by the island in shallow waters was common; it had a name, the ‘Giglio salute’. The passengers loved it, the islanders loved it. Months before, the mayor of Giglio had thanked another captain of Costa for the ‘unparalleled spectacle that has become an indispensable tradition’.
Understanding why people do what they do — particularly when they don’t follow the rules — is of vital importance in high-reliability industries, such as civil aviation, NASA, pharmaceuticals, and nuclear power.
Traditionally, we have viewed defence-in-depth as a series of barriers, safety systems and processes that are layered upon each other in a tightly-controlled, well-engineered fabric.
In the nuclear industry, the first layer is a series of physical barriers to prevent the accidental release of radiation. This means enclosing uranium pellets inside fuel rods, which are themselves enclosed inside a reactor vessel, which is then enclosed within a containment building.
On top of this, we have redundant and diverse safety systems to ensure the nuclear fuel remains sufficiently controlled and cooled. These systems are designed and built to the highest standards, routinely tested, maintained and reviewed by external experts such as WANO peer review teams.
Layered over this, we have robust emergency response plans which are also routinely rehearsed and enhanced.
Added together, we have the traditional view of defence-in-depth, which is really the marriage of equipment, processes and people.
As engineers and nuclear professionals, we’ve always drawn comfort from the first two elements of that equation. We understand the robust nature of the equipment we design or operate. We know the property of its materials and the craftsmanship required to produce the components housed within our reactors.
Similarly, we see the detail and reassuring logic of the processes we follow to bring those machines to life. We see the safeguards built into the very procedures that are meant to guide us, step-by-step, through their safe use.
These are predictable and repeatable systems. You feed a defined set of limits into them and they give you a defined and limited set of outputs. So-called ‘complicated systems’ are often engineering marvels; aircraft, for example. An aeroplane is a complicated collection of interrelated, but individual components. The system as a whole will only function if its components work together properly. You can disassemble it, put it back together and the machine will still work.
In this type of system, risk can be quantified as the sum of the reliability of the components, using metrics such as mean time before failure (MTBF), finite mode and effect analysis (FMEA) and single- point vulnerability (SPV). Having quantified the risk, one can then decide if that level of risk is acceptable or not.
When an event occurs in a complicated system, accident investigators will use the Newtonian analysis of cause-and-effect to find out what went wrong. They believe that they can reconstruct an entire event starting at the outcome and tracing their way back along the causal chain in time, because there are no effects without an initiating cause, or input, to the system.
In the wake of an event or accident, our natural reaction is to build more physical defence-in-depth barriers based on this linear chain of events. It is the goal of our root cause analysis to distil an event down to one root cause, be it equipment failure, or act, which is normally ‘human error.’
Don’t get me wrong. This approach has served our industry well in the past, does so today, and will continue to be a powerful tool well into the future. But it does not tell the whole story. It often does not adequately consider the human factor. In my experience, people are the most important part of the mix. They can also be the most challenging.
Being human, we can circumvent both equipment and process, either unwittingly or wittingly. Equipment behaves according to the laws of physics, laws of thermodynamics, and the properties of materials, but people exist in a complex and ever-changing workplace that is open to personal interpretation. We react based upon our personal experiences, the stresses we feel, and the situations in which we find ourselves.
People are a big part of another type of systems, which some people call complex systems. They are subjected to a host of influences, human interactions and relationships that go way beyond engineering specifications and reliability predictions.
Consider my aircraft example: The plane itself is a complicated system of components. Getting it to actually fly is a complex system that involves pilots and crew members. These people are highly skilled and experienced. They are also influenced by everything from fatigue, schedule pressures, career pressures, the quality of their training, crew dynamics and even the weather.
Given the human element, complex systems are almost an organic process. Unlike machines, where outputs are dependent on, and proportional to, inputs, people can accept, ignore, challenge or misunderstand the inputs that are fed into the system; inputs and outputs are interdependent and their exact relationships are unclear.
Being human, they can also have very different perceptions and perspectives on what happened when they look back on an event, what is happening when they are in the midst of an event, and what will happen in the future if faced with an unfolding accident. Actually determining which of these is closest to reality is often very difficult.
In a complex system, components respond locally to the kind of information they receive and the degree of freedom they have to act, both in means and time. With people in the mix — not solely hardwired machines — inputs and outputs cannot always be reconstructed. That also means that an infinitesimal change in starting conditions can lead to significant differences later on, effectively removing any proportional relationship between them.
As celebrated British chemist Trevor Kletz once observed, "To say accidents are due to human failing is like saying falls are due to gravity. It is true, but it does not help us prevent them."
Investigators — and readers of accident reports — need to acknowledge that absolute reconstruction of the event with non-linear reactions of the people involved may be elusive. The next evolution is trying to represent complex system behaviour by focusing not so much on human error and violations but on the mechanisms that generate their behaviour in the actual dynamic work context.