View
226
Download
0
Tags:
Embed Size (px)
Citation preview
Scott [email protected] – (949) 854-0519
Risk Aversion and Other Obstacles to Mission Success
A presentation to SPINMarch 2, 2007
Systems Architecture and Engineering Program: A Node of the Resilience Engineering Network
Acknowledgments
This presentation is based in part on a paper titled “The Science of Organizational Psychology Applied to Mission Assurance” presented to the Conference on Systems Engineering Research, Los Angeles, 7-8, April 2006. Co-authors were Katherine Erlick, PhD, and Joann Gutierrez both of whom have degrees in organizational psychology.
System Resilience, Culture and Paradigms
• System Resilience – the ability of organizational, hardware and software systems to mitigate the severity and likelihood of failures or losses, to adapt to changing conditions, and to respond appropriately after the fact.
• Culture is a key element in System Resilience
• Thesis: The traditional methods of executive mandate and extensive training are not sufficient to achieve a System Resilience culture. The science of organizational psychology promises to show us a better way.
• Common paradigms, especially with respect to risk, can be obstacles to a System Resilience
SystemResilience
Capabilities Infrastructure Culture
can be divided into
OperationalInfrastructure
OrganizationalInfrastructure
is enabled by
CulturalInitiatives
can be enhanced by
can be obstructed by
Metrics
measure
determine improvements in
can be divided into
TechnicalCapabilities
ManagerialCapabilities
apply to allapply to all
can be divided into
InternalOrganization
ExternalOrganization
Figure 1 The Architecture of System Resilience
SystemResilience
Capabilities Infrastructure Culture
can be divided into
OperationalInfrastructure
OrganizationalInfrastructure
is enabled by
CulturalInitiatives
can be enhanced by
can be obstructed by
Metrics
measure
determine improvements in
can be divided into
TechnicalCapabilities
ManagerialCapabilities
apply to allapply to all
can be divided into
InternalOrganization
ExternalOrganization
SystemResilience
Capabilities Infrastructure Culture
can be divided into
OperationalInfrastructure
OrganizationalInfrastructure
is enabled by
CulturalInitiatives
can be enhanced by
can be obstructed by
Metrics
measure
determine improvements in
can be divided into
TechnicalCapabilities
ManagerialCapabilities
apply to allapply to all
can be divided into
InternalOrganization
ExternalOrganization
Figure 1 The Architecture of System Resilience
The System Resilience Architecture
Tools and processes, e.g., risk, are here
Taking risk seriously is
here
Common Root Causes Suggest Key Capabilities
Root Causes
Lack of Rigorous System Safety
Lack of Information Management
Culture
Lack of Risk Management
Regulatory Faults
Lack of Review and Oversight
Incomplete Verification
Conflicting Priorities
Poor Schedule Management
Lack of Expertise
Organizational Barriers
Maintenance
Cost Management
Incomplete Requirements
Faulty Decision Making
Emergence
Capabilities
Cultural Initiatives
System Resilience Oversight
System Resilience Infrastructure
Risk Management
Schedule Management
Cost Management
Requirements Management
Technology Management
Verification
System Safety
Configuration Management
Expertise
Software
Manufacturing
Operations
Work Environment
Information Management
Regulatory Environment
Maintenance
Reliability
Supplier Management
AdaptabilityCapabilities require more than system safety and reliability
Case Studies Covered Many Domains American Flight 191 – Reason
Apollo 13 – Leveson, Reason, Chiles
Bhopal – Leveson, Reason, Chiles
Challenger – Vaughn, Leveson, Reason, Chiles, Hollnagel
Chernobyl – Leveson, Reason, Chiles
Clapham Junction – Reason
Columbia – Columbia Investigatory Committee, Chiles, Hollnagel
The Fishing Industry - Gaël
Flixborough – Leveson, Reason
Hospital Emergency Wards – Woods and Mears
Japan Airlines 123 – Reason
Katrina - Westrum
King’s Cross Underground – Leveson, Reason
Mars Lander - Leveson
Nagoya Airbus 300 – Dijkstra
New York Electric Power Recovery on 9/11 - Mendoça
Philips 66 Company – Reason, Chiles
Piper Alpha – Reason, Chiles, Paté-Cornell, Hollnagel
Seveso – Leveson, Reason
Texas City – Hughes, Chiles
Three Mile Island – Leveson, Reason, Chiles
TWA 800 – National Transportation Safety Board (NTSB)
Windscale – Leveson
Risk Emphasis
Some Quotes
“A safety culture is a learning culture.”
James Reason, Managing the Risks of Organizational Accidents
“The severity with which a system fails is directly proportional to the intensity of the designer's belief that it cannot.” (The Titanic Effect)
Nancy Leveson, Safeware: System Safety and Computers
“Focus on problems.”Weick and Sutcliffe, Managing Uncertainty
“One of our largest problems was success.”Cor Horkströter, Royal Dutch/Shell
The Feynman Observation
“[Feynman’s] failure estimate for the shuttle system was 1 in 25…”
“[NASA’s] estimate [of failure] range from 1 in 100 [by working engineers] to 1 in 100,000 [by management]”
Diane Vaughn,
The Challenger Launch Decision
Priorities: A personal view(page 1)
Priority Number 3 – Do you have a good risk tool?
Priorities: A personal view(page 2)
Priority Number 2 – Do you have a good risk process?
Priority Number 3 – Do you have a good risk tool?
Priorities: A personal view(page 3)
Priority Number 1 – Do you take risk seriously?
Priority Number 2 – Do you have a good risk process?
Priority Number 3 – Do you have a good risk tool?
Two important risk paradigms
- Paradigm No. 1 – The belief that even having risks is a sign of bad management
- Paradigm No. 2 – Risk as a “normative” condition
Diane Vaughn:
“NASA’s ‘can do’ attitude created a … risk taking culture that forced them to push ahead no matter what..”
“…flying with acceptable risks was normative in NASA culture.”
Definition: Mind-set, perception, way of thinking, cultural belief
Some ParadigmsDefinition: Mind-set, perception, way of thinking, cultural belief
• Don’t bother me with small problems.
• Our system (airplane, etc) is safe. It has never had a major accident.
• We can’t afford to verify everything.
• My job is to assure a safe design.
• If I am ethical, I have nothing else to worry about.
More Paradigms (p. 2)
• If I get too close to safety issues, then I may be liable.
• Safety is the responsibility of individuals, not organizations
• Our customer pays us to design systems, not organizations
• Human error has already been taken into account in safety analyses
• Accidents are inevitable; there is nothing you can do to prevent them
• Organizational issues are the purview of program management
Still More Paradigms (p. 3)
• I am hampered by scope, schedule and cost constraints
• Our contracts (with the customer and suppliers) do not allow us to consider aspects outside of design
• Human errors are random and uncontrollable
• You can’t predict serious accidents
• To change paradigms all we need is a good executive and lots of training
Some Thoughts on Risk(from the Second Symposium on Resilience Engineering, Juan-
les-Pins, France, November 2006)
Wreathall says we must consider “meta-risks,” that is, risks that we all know are there and do not consider
Epstein says that the important risks are in the lower right hand corner of the risk matrix:
• Low probability• High consequence• Lots of them• Examine by simulation
The Genesis of Paradigms
Our paradigms
Cu
ltu
ral
Bel
iefs
Pre
ssu
res
(co
st,
sch
edu
le,
etc.
)
The Old Model
Executive Management
Employees
Start Here
TheVision
AttendTraining
The New Model (Simplified)
Executive Management
Employees
Learn
Start Here
New Paradigms
EndorseSelf-Discovery
EstablishCommunities ofPractice
Some Approaches
• Training
• The Hero-Executive
• Socratic teaching
• Coaching
• Self-discovery through communities of practice
• Independent reviews
• Cost and schedule margins
• Standard processes
• Teams
• Rewards and incentives
• Management selection
Core Group Community of Practice
Self-Discovery Through Communities of Practice
• Bottom-up
• Informal
• Core Group
• Dialogue
• Respect
• Inclusive
Conclusions• Progress is both top-down and bottom-up
• Organizational psychology is a necessary discipline for mission assurance and, hence, also for systems engineering
• Training and top-down mandates have limited effectiveness
• Self-discovery is the preferred path. No one can teach you the right paradigm; you have to learn it yourself.
Some Recommended Reading
Vaughn, Diane, The Challenger Launch Decision: Risky Technology, Culture and Deviance at NASA, University of Chicago Press, 1996
Reason, James, Managing the Risks of Organizational Accidents, Ashgate, 1997
Leveson, Nancy, Safeware: System Safety and Computers, Addison Wesley, 1995
Weick, Karl E. and Sutcliffe, Kathleen M., Managing the Unexpected, Jossey-Bass, 2001
Senge, Peter, et al, The Dance of Change, Doubleday, 1999
Wegner, Etienne, et al, Communities of Practice: Learning, Meaning and Identity, University of Cambridge, 1998