16
Professo this case Cases ar Copyrig write Ha photoco ROBER ANET T Jet Ge decisi Biolog of $60 decid when Hist Th opera Aeron full-ti A found spons and o that le 1958, therea the R mann Marin the V NASA of its NA sched Janua Space missio groun ors Robert S. Kapla e is fictional and b re not intended to s ght © 2010 Presiden arvard Business Sc opied, or otherwise T S. KAPLAN T E MIKES Propul entry Lee, sen ion faced by gical Explorer 00 million do ed the remai n the planets a tory and M he Jet Propul ated by Califo nautics and S ime CalTech e group of C ded JPL duri sorship of the on February 1 ed to the disc the U.S. con after, became Ranger and Su ned Apollo lan ner spacecraft Voyager missi A’s Hubble Sp inter-planetar ASA, howeve duled launch ary 1986, the e Shuttle Chal ons but had i nd controllers an and Anette Mike ased on the genera serve as endorseme nt and Fellows of H chool Publishing, Bo reproduced, poste lsion L nior systems the risk rev r mission. Aft ollars, howev ining risks we again re-align Mission lsion Labora ornia Institut Space Admin employees an alTech gradu ing World W e U.S. Army. 1, 1958, it hel covery of the nsolidated its NASA’s prim urveyor robo ndings. JPL la t to Venus, M ions to Jupite pace Telescop ry robotic mis er, had also e of Apollo 1, th seven crew m llenger broke a its most visibl s in 1993. So es prepared this ca al experiences of L ents, sources of prim Harvard College. T oston, MA 02163, o ed, or transmitted, w Laborat The only t engineer at t view board s ter a develop ver, significan ere too high, ned. Lee pond tory (JPL) w te of Techno nistration), the nd managed s uate students War II to dev During the 1 ped launch E e Van Allen r s various sp mary planetar otic spacecraf aunched succ Mars, and Me er, Saturn, Ur pe and opera ssions. experienced se hree astronau members, inc apart 73 seco le failure whe ome described ase with the assista Lee and Lewicki. H mary data, or illust To order copies or or go to www.hbsp without the permis ory hing worse tha the Jet Propu seven weeks ment period nt mission-thr the next lau dered whether was a researc logy (CalTec e U.S. space several thousa s and their velop and te 1950s, JPL car Explorer 1, A adiation belts ace program ry spacecraft ft missions to cessful interp ercury, the Ga ranus, and N ted the Deep everal tragic uts died when cluding schoo onds after lau en the Mars O d this $1 billi ance of Gentry Lee HBS cases are devel trations of effective request permission p.harvard.edu/edu ssion of Harvard Bu an a delay is a lsion Laborat before the s of more than reatening risk unch opportun r to recomme ch and devel ch) under a c agency. JPL and contracto adviser, Prof est rockets an rried out thre merica’s first s high above ms into a new center. JPL en o the Moon planetary expl alileo mission Neptune. JPL Space Netwo failures. In Ja n a fire erupt ol teacher Chr nch. JPL was Observer, laun ion project as and Chris Lewicki loped solely as the e or ineffective man n to reproduce mat cators. This publica usiness School. mission that fa tory, contemp scheduled la four years an ks still remai nity would b nd launch or lopment cent contract from employed ap ors. fessor Theod nd guided m ee successful t satellite, wh the Earth’s s w agency, NA ngineers desig that prepared loration missi n to Jupiter a also develop ork for comm anuary 1967, ted in a grou rista McAulif s not involved nched in 1992 s “a huge am 9-110 - REV: MAY 2 7 i. The mission desc e basis for class dis nagement. terials, call 1-800-5 ation may not be d ails. Gentr plated the dif unch of the nd the expend ined. If the b be 26 months delay. ter, managed m NASA (Na pproximately dore von Kar missiles unde sub-orbital f hich sent back surface. In Oc ASA. JPL, sh gned and ope d the way fo ions, includin and its moons ped the camer munication wi months befor nd-test capsu ffe, died whe d in either of 2, lost contact mount of taxpa - 031 7 , 2010 cribed in scussion. 45-7685, igitized, ry Lee fficult Mars diture board later, d and tional 5,000 rman, er the flights k data ctober hortly erated or the ng the s, and ra for ith all re the ule. In en the those t with ayers’

Jet Propulsion Laboratory - Portal da Gestão de Riscos · Jet Propulsion Laboratory 110-031 3 authoring four novels with science fiction grandmaster Arthur C. Clarke, collaborating

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Jet Propulsion Laboratory - Portal da Gestão de Riscos · Jet Propulsion Laboratory 110-031 3 authoring four novels with science fiction grandmaster Arthur C. Clarke, collaborating

Professothis caseCases ar Copyrigwrite Haphotoco

R O B E R

A N E T T

Jet

GedecisiBiologof $60decidwhen

Hist

ThoperaAeronfull-ti

A foundsponsand othat le1958, thereathe RmannMarinthe VNASAof its

NAschedJanuaSpacemissiogroun

ors Robert S. Kaplae is fictional and bre not intended to s

ght © 2010 Presidenarvard Business Sc

opied, or otherwise

T S . K A P L A N

T E M I K E S

Propul

entry Lee, senion faced bygical Explorer00 million doed the remai

n the planets a

tory and M

he Jet Propulated by Califonautics and Sime CalTech e

group of Cded JPL durisorship of theon February 1ed to the discthe U.S. con

after, became Ranger and Suned Apollo lanner spacecraft

Voyager missiA’s Hubble Spinter-planetar

ASA, howeveduled launch ary 1986, the e Shuttle Chalons but had ind controllers

an and Anette Mikeased on the generaserve as endorseme

nt and Fellows of Hchool Publishing, Bo

reproduced, poste

lsion L

nior systems the risk revr mission. Aftollars, howevining risks weagain re-align

Mission

lsion Laboraornia Institut

Space Adminemployees an

alTech graduing World We U.S. Army. 1, 1958, it helcovery of thensolidated itsNASA’s prim

urveyor robondings. JPL lat to Venus, Mions to Jupitepace Telescopry robotic mis

er, had also eof Apollo 1, thseven crew m

llenger broke aits most visibls in 1993. So

es prepared this caal experiences of Lents, sources of prim

Harvard College. Toston, MA 02163, o

ed, or transmitted, w

Laborat

The only t

engineer at tview board ster a develop

ver, significanere too high,

ned. Lee pond

tory (JPL) wte of Techno

nistration), thend managed s

uate studentsWar II to dev

During the 1ped launch E

e Van Allen rs various spmary planetarotic spacecrafaunched succ

Mars, and Meer, Saturn, Urpe and operassions.

experienced sehree astronaumembers, incapart 73 secole failure whe

ome described

ase with the assistaLee and Lewicki. Hmary data, or illust

To order copies or or go to www.hbspwithout the permis

ory

hing worse tha

the Jet Propuseven weeks ment period

nt mission-thr the next lau

dered whether

was a researclogy (CalTece U.S. space several thousa

s and their velop and te1950s, JPL carExplorer 1, Aadiation beltsace programry spacecraft ft missions tocessful interpercury, the Garanus, and Nted the Deep

everal tragic uts died whencluding schooonds after lauen the Mars Od this $1 billi

ance of Gentry Lee HBS cases are devel

trations of effective

request permissionp.harvard.edu/edussion of Harvard Bu

an a delay is a

lsion Laboratbefore the s

of more thanreatening risk

unch opportunr to recomme

ch and develch) under a cagency. JPL

and contracto

adviser, Profest rockets anrried out thremerica’s firsts high above

ms into a newcenter. JPL eno the Moon

planetary explalileo mission

Neptune. JPL Space Netwo

failures. In Jan a fire erupt

ol teacher Chrnch. JPL was

Observer, launion project as

and Chris Lewickiloped solely as thee or ineffective man

n to reproduce matcators. This publicausiness School.

mission that fa

tory, contempscheduled la four years anks still remainity would bnd launch or

lopment centcontract fromemployed ap

ors.

fessor Theodnd guided mee successful t satellite, wh

the Earth’s sw agency, NAngineers desigthat preparedloration missin to Jupiter aalso developork for comm

anuary 1967, ted in a grourista McAulif

s not involvednched in 1992

s “a huge am

9-110-R E V : M A Y 2 7

i. The mission desce basis for class disnagement.

terials, call 1-800-5ation may not be d

ails. — Gentr

plated the difunch of the nd the expendined. If the bbe 26 months

delay.

ter, managedm NASA (Napproximately

dore von Karmissiles unde

sub-orbital fhich sent backsurface. In OcASA. JPL, shgned and oped the way foions, includin

and its moonsped the camermunication wi

months befornd-test capsuffe, died whed in either of 2, lost contact

mount of taxpa

-0317 , 2 0 1 0

cribed in scussion.

45-7685, igitized,

ry Lee

fficult Mars

diture board later,

d and tional 5,000

rman, er the flights k data ctober hortly erated or the ng the s, and ra for ith all

re the ule. In en the those

t with ayers’

Page 2: Jet Propulsion Laboratory - Portal da Gestão de Riscos · Jet Propulsion Laboratory 110-031 3 authoring four novels with science fiction grandmaster Arthur C. Clarke, collaborating

110-031 Jet Propulsion Laboratory

2

money spent for nothing.” In the early 1990s, the political and public mood demanded reforms to the space program, which led to the appointment (in 1992) of Daniel Goldin as the new NASA administrator. Goldin, formerly an executive at aerospace contractor TRW, believed that new management techniques and technologies, along with accepting more risk, would dramatically reduce the cost of NASA’s missions. In a 1992 speech, he challenged JPL to adopt “faster, better, cheaper” techniques so that it could do more without spending more money. He asserted:

Be bold—take risks. [A] project that’s 20 for 20 isn’t successful. It’s proof that we’re playing it too safe. If the gain is great, risk is warranted. Failure is OK, as long as it’s on a project that’s pushing the frontiers of technology.1

But the new strategy did not reverse the incidence of major failures. The Mars Climate Orbiter disappeared during orbit insertion on Sept. 23, 1999, due to a navigation error; analyses had been performed and communicated using English units (feet and pounds) rather than NASA-mandated metric units (meters and kilograms). The Mars Polar Lander disappeared as it neared the surface of Mars in December 1999. To save money, the Lander did not have telemetry during its descent to Mars, and subsequent analysis suggested that the failure was probably due to a software fault that shut off the descent rocket too early, causing the spacecraft to fall the last 40 meters onto the surface. These two failures ended the “faster, better, cheaper” management philosophy for Mars Landers.

NASA’s manned space program experienced another tragic failure with the loss, on February 1, 2003, of Space Shuttle Columbia and its seven crew members 16 minutes before scheduled touchdown. The Columbia Accident Investigation Board concluded that the accident was not an anomalous random event, but rather

. . . rooted in NASA’s Space Shuttle history and culture, including the original compromises that were required to gain approval for the Shuttle, subsequent years of resource constraints, fluctuating priorities, schedule pressures . . . and lack of an agreed national vision for human space flight. Cultural traits and organizational practices detrimental to safety [included] reliance on past success as a substitute for sound engineering practices; organizational barriers that prevented effective communication of critical safety information and stifled professional differences of opinion; lack of integrated management across program elements; and the evolution of an informal chain of command and decision-making processes that operated outside the organization’s rules.2

Implementing a New Risk Management Culture at JPL

In 2000, NASA’s new Mars Program Director, Scott Hubbard, asked Gentry Lee, a former JPL employee, to return and help develop an architecture for a new Mars Mission Program. Hubbard wanted the architecture to include a risk management program that would significantly increase JPL’s mission success rate. Lee, a graduate of the University of Texas and MIT, had worked with JPL from 1969 to 1976 as part of the Viking project team that engineered the first successful landing of a spacecraft on Mars. Lee subsequently became chief engineer of the Galileo project, which over its 10- year mission, explored Jupiter with both an atmospheric probe and an orbiter that mapped the planet’s major satellites. Galileo was the last of the grand-scale missions before NASA’s “faster, better, cheaper” era. Lee left JPL during this era to pursue various other activities including co-

1 Daniel Goldin, transcript of remarks and discussion at the 108th Space Studies board meeting, Irvine, Calif., November 18, 1992; Daniel Goldin, “Toward the Next Millennium: A Vision for Spaceship Earth,” speech delivered at the World Space Congress, September 2, 1992. 2 Executive Summary, Columbia Accident Investigation Board Report Volume 1: 9 (August 2003).

Page 3: Jet Propulsion Laboratory - Portal da Gestão de Riscos · Jet Propulsion Laboratory 110-031 3 authoring four novels with science fiction grandmaster Arthur C. Clarke, collaborating

Jet Propulsion Laboratory 110-031

3

authoring four novels with science fiction grandmaster Arthur C. Clarke, collaborating with Carl Sagan on an award-winning science documentary series for television, designing computer games, writing columns, and lecturing on space exploration and extra-terrestrial life.

Lee accepted Hubbard’s offer and in 2002 became JPL’s chief systems engineer, with responsibility for the engineering integrity of all JPL planetary missions. Lee defined his role as “minister without portfolio, the person who made sure everything worked the way it was supposed to on a global scale.” He described how he thought about mission risks:

At the start of a project, try to write down everything you can that is risky. Then put together a plan for each of those risks, and watch how the plan evolves. Some risks are “business as usual risks.” We are familiar with these risks and know how to quantify and mitigate them. Others are “development risks,” in which the project’s engineering enters territory we have never experienced before. And, finally, we have risks imposed by the environment that we can’t control, which we call the “unknown unknowns.” We attempt to quantify all the risks of each type and aggregate them into an approximate likelihood of mission success. The final question we face is, “do we launch or not?” How large does failure have to loom before you decide to cancel or delay a project in which the project people have worked for years and taxpayers have invested hundreds of millions of dollars in the hope of producing important new scientific knowledge?

Lee believed that “risk mitigation was painful; not a natural event for humans to perform,” and that overcoming cultural resistance would be his largest challenge. He explained:

JPL engineers graduate from top schools at the top of their class. They are used to being right in their design and engineering decisions. I have to get them comfortable thinking about all the things that can go wrong. This requires accepting a culture of intellectual confrontation.

Peoples’ ambitions and careers get wrapped up in being right all the time. They have to learn that it’s not important whether your initial idea is right. It’s important whether or not the idea we go forward with is right. And that’s what intellectual confrontation helps us achieve.

JPL already had an existing risk assurance process but project engineers typically viewed it as peripheral to their work, something they had to do just before milestone reviews. Lee wanted risk management to become embedded within the engineering process so that it would be continually front-of-mind during a project’s life.

Referring to Janus, the two-faced Roman god of gates and doors who looked forward and backward, Lee remarked, “Innovation, looking forward, is absolutely essential, but innovation needs to be balanced with reflecting backwards, learning from experience about what can go wrong.”

Source: Wikipedia, http://en.wikipedia.org/wiki/File: Janus-Vatican.JPG, accessed February 2010. Reprinted with permission under the terms of the GNU Free Documentation License.

Page 4: Jet Propulsion Laboratory - Portal da Gestão de Riscos · Jet Propulsion Laboratory 110-031 3 authoring four novels with science fiction grandmaster Arthur C. Clarke, collaborating

110-031 Jet Propulsion Laboratory

4

Over the next six years, Lee helped introduce a comprehensive system for managing the risks of planetary missions. While early elements of the system had been used in JPL-managed projects during this time, the Mars Biological Explorer (MBE) program was the first that used the system from initial project formulation all the way through launch.3 Recent missions to Mars had confirmed the presence of iced water in the North Polar Region and the widespread existence of salts and minerals elsewhere that could have been formed with water. NASA scientists now believed that the water could have supported life. The $745 million MBE mission would send a non-mobile platform to Mars to acquire and analyze subsurface samples for bio-markers that would indicate the presence of recent or active life-processes under the surface of Mars.

MBE, like any planetary landing program, consisted of four principal stages: launch, cruise, entry-descent-landing, and surface operations. Each stage had its own challenges and project team. An engineer on the cruise stage remarked that guiding a spacecraft from Earth to a specific location on Mars was comparable to shooting a baseball from the pitching mound of Dodger Stadium in Los Angeles to cross the outside corner of the plate in Wrigley Field in Chicago. But, she pointed out, the spacecraft cruise was harder because Earth and Mars were both moving relative to each other in their own orbits around the sun while simultaneously spinning on their axes. Lee felt, however, that the laws of planetary motion made the cruise-stage risks “known unknowns,” whereas landing the spacecraft safely on Mars during the entry-descent-landing (EDL) stage faced “unknown unknowns.” The spacecraft would arrive at the atmosphere of Mars traveling at 12,000 miles per hour, 20 times faster than a speeding bullet, and, within a few minutes, had to decelerate and land safely on a surface of unpredictable composition and slope (see Exhibit 1). The engineers in the Pasadena control room described the EDL stage, during which they could not communicate with the spacecraft, as their “six minutes of terror.”

Risk Review Board

The MBE project had a 12-person risk review board, chaired by Lee, consisting of experienced and respected technical experts from JPL, NASA management, and the project’s prime contractor. The members of the board were independent of the project and had been chosen based on their ability to bring knowledge and expertise to it. The experts served on the risk review board because they could make an important contribution to mission success, even without being directly involved in the project.

The review board created the culture of intellectual confrontation during three critical review meetings during the project. At each of the three-day meetings, the board played devil’s advocate, questioning and challenging project engineers about their assumptions for how the mission would work. Lee described the role of the risk review board:

Often project people have bet their careers on a mission, and have become comfortable making assumptions about the parameters they need to design for. The risk review board is an independent group who are empowered to ask about the bad things that can happen to good designs. What if the parameter changed from 12 to 16? We don’t want just technical expertise on a risk review board. We need a certain type of personality, people with a lot of self-confidence who are willing to speak out and challenge. We want them to be paranoid, constantly worrying about what can go wrong.

3 The Mars Biological Explorer is a fictional composite of several Mars landing missions. The issues described for the MBE project are based on situations that occurred during actual Mars landing projects.

Page 5: Jet Propulsion Laboratory - Portal da Gestão de Riscos · Jet Propulsion Laboratory 110-031 3 authoring four novels with science fiction grandmaster Arthur C. Clarke, collaborating

Jet Propulsion Laboratory 110-031

5

Risk review meetings were highly interactive, challenging, and intense. Systems engineer Chris Lewicki, an MBE review board member, described the culture at the meetings:

We tear each other apart in a review, throwing stones and giving very critical commentary about everything that’s going on. The risk review process gives the project engineers an opportunity to see their work from another perspective. It lifts their noses away from the grindstone. For the past year, they have been focused on how some component worked and became personally invested in it. Now they find out it’s either much more important than they had perceived or, occasionally, insignificant in the context of everything else going on. It’s rarely exactly what they thought it was.

Lee concurred: “Engineers have a forest and a tree problem. They may spend 75% of their time worrying about things like polishing a cannonball, which have little impact on mission success. Only 25% of their time is spent on risks that could cause mission failure.”

Preliminary Mission and Systems Review

The MBE review board’s first meeting, the Preliminary Mission and Systems Review (PMSR), occurred 51 months before the targeted launch date. The project team that attended the PMSR included the Project Manager, Mission Assurance Manager, Project Scientist, Mission Manager, and Project Systems Engineer. First they described the mission to the risk review board, and then the science necessary to accomplish it, the instrumentation necessary to perform the science, and the operations plan to make sure it would all happen. They highlighted six critical risks during the EDL phase of the mission that had to be addressed. The risks had been categorized as either “implementation risks,” which posed a challenge to completing the project in time for the launch of the spacecraft, or mission risks, which could arise during the mission itself.

The MBE project team had identified the critical risks through a project risk assessment process that classified each possible mission or implementation risk along two dimensions: the consequences if the risk occurred, and the likelihood of a risk occurrence. The team used the scales shown below to classify the risks:

Mission Consequence of Occurrence

Score Consequence Definition

5 Very High Mission failure

4 High Significant (75%) degradation in mission benefits

3 Moderate Moderate (50%) degradation in mission benefits

2 Low Small (25%) degradation in mission benefits

1 Very Low Minimal (or no) degradation in mission benefits

Implementation Consequence of Occurrence

Score Consequence Definition

5 Very High Overran budget or contingency; unable to launch with current resources

4 High Consumed all budget, schedule or margin

3 Moderate Significant reduction in margin or launch date slack

2 Low Small reduction in margin or launch date slack

1 Very Low Minimal reduction in margin or launch date slack

Page 6: Jet Propulsion Laboratory - Portal da Gestão de Riscos · Jet Propulsion Laboratory 110-031 3 authoring four novels with science fiction grandmaster Arthur C. Clarke, collaborating

110-031 Jet Propulsion Laboratory

6

The team assessed the risk’s likelihood of occurrence based on experience, an estimate inferred from a statistical sample, or (lacking either of these) an educated guess.

Likelihood of Occurrence

Score Consequence Definition

5 Very High Almost certain (>70%)

4 High More likely than not (>50%)

3 Moderate Significant (>30%)

2 Low Unlikely (>5%)

1 Very Low Very unlikely (<5%)

The team displayed the criticality of each risk on a two-dimensional “heat map” (see Exhibit 2). Each cell represented the product of the risk’s likelihood and its consequence. A risk with a very high probability (5) but a very low consequence (1) had a low total score (5×1 = 5), in the GREEN zone. A moderate risk (3) with a moderate likelihood (3) had a total score of 9.

At the PMSR, the project team presented all identified risks to the risk review board but spent most of the meeting discussing the six critical risks that had scored in the YELLOW category (overall score 6–12). (No risk was yet in the RED zone (15 and higher).)

1. The Heat Shield performed the first task during EDL. As the spacecraft entered the Martian

atmosphere, the heat shield had to protect the spacecraft from burning up while removing 99% of the kinetic energy from its high-velocity interplanetary trajectory. Since the MBE mission was using a larger spacecraft than previous Martian missions and would approach the Martian atmosphere at a higher entry velocity, its heat shield had to absorb and dissipate a much higher heat load than in previous missions. The project was exploring increasing the thickness of the heat shield as the primary option, which would significantly increase its mass and thus stress other areas of the spacecraft design. An alternative was to modify the trajectory to provide a different heating profile, but this came at the expense of the performance margin in other areas of EDL, while also increasing the uncertainty in the size of the ellipse in which the MBE was predicted to land.

Refining the heat shield design did not entail technical or cost risks, but any higher mass had to be offset by other design decisions. The likelihood that the environment would exceed the heat shield’s capability was low (10%–30%, or rank 2), but the mission consequence of failure would be catastrophic (rank 5), leading to a total score of 10.

2. The Parachute would deploy once the spacecraft had slowed to 1,000 mph and was the next

most important device in slowing the MBE’s approach to landing. The parachutes from past missions were not large enough to slow the MBE to a safe terminal descent velocity. These parachutes, however, already completely filled the canister, so a larger parachute would mean either a larger canister or a lighter material. A change in the trajectory could provide the parachute with more time to perform its task, but doing so would increase the heat load on the already-stressed heat shield. The project team thought the problem could be solved by some alternative design choices without increasing size or weight.

Page 7: Jet Propulsion Laboratory - Portal da Gestão de Riscos · Jet Propulsion Laboratory 110-031 3 authoring four novels with science fiction grandmaster Arthur C. Clarke, collaborating

Jet Propulsion Laboratory 110-031

7

The mission consequence of an underperforming parachute was severe (rank 5), but again, the likelihood was perceived to be low (rank 2), yielding a combined score of 10.

3. The onboard Radar had to sense when the craft, dangling from its parachute, was within 15 seconds of hitting the surface, and then fire the retro-rockets to slow and guide the spacecraft from 150 mph to a gentle landing on a level surface. The radar, which operated for only two minutes, must start up and operate after nine months of hibernation on its voyage through deep space. The current radar was a modified version of a military fighter-jet radar, and engineers were unsure that it could perform its task perfectly in the space environment.

The mission consequences of failure were catastrophic, hence the rank of 5. The engineers had yet to reach a consensus on the likelihood of the problem manifesting itself. The radar was being used on other missions, but these were still en route to their distant destinations, and the few available data points did not build sufficient confidence. Some experts, including the radar vendor, argued that it should be classified as a very low risk (1), but the JPL experts believed it should be classified as a YELLOW risk for increased visibility until the problem was better understood. The project team and board agreed on a placeholder likelihood of 2.

4. The MBE cruise was powered by Solar Arrays that would be jettisoned before the EDL

phase. After the spacecraft landed on Mars, it must autonomously deploy new solar arrays and reconfigure the power system for continued survival on the surface of Mars. A new power system and improved solar array technology had been selected for this mission, but only limited development and testing had been performed to date. The array’s mechanism was new territory for the development team and while nothing specific had arisen as a source of problems, the engineers expected surprises along the way. There would be two devices on board so the failure of one would not end the mission, but would substantially degrade its capability. They rated the development risk as YELLOW, with a consequence rank of 4, and a likelihood of 3. The team expected the consequence ranking to remain a 4 throughout the development cycle, but hoped to drive the likelihood to a 1.

5. The Bio-Marker Science Analyzer Instrument (BMSA) was the centerpiece of the MBE

mission. The BMSA’s sensors and processes performed the primary activity of the science mission. The technology for the bio-marker sensors had never been flown in space before, and its packaging into the craft required a level of miniaturization more advanced than in any previous space science instrument.

If the BMSA instrument failed to perform, the science benefits from the project would be substantially diminished. The team assigned a consequence risk of 4. A lively discussion ensued about the progress and work required to get the instrument ready for launch. One review board member argued that the instrument could not be finished in time for launch and wanted it declared a RED risk, but the majority of the board argued for a high YELLOW risk (3).

6. The BMSA instrument needed a sub-surface sample acquisition system (SSSAS) to collect

soil, process it to the required consistency, and deliver a minimum volume through one of the BMSA inlet ports. It was impossible to predict the conditions of the soil before the mission. If this acquisition system did not perform its required task perfectly, the complex BMSA instrument could not do any analysis.

Page 8: Jet Propulsion Laboratory - Portal da Gestão de Riscos · Jet Propulsion Laboratory 110-031 3 authoring four novels with science fiction grandmaster Arthur C. Clarke, collaborating

110-031 Jet Propulsion Laboratory

8

The development risks of this device were typical, but the mission risk they posed were identical to non-performance of the BMSA, a consequence of 4. The exact soil conditions at the landing site would be mission “unknown unknowns,” not knowable in advance. The team was developing some operational contingency procedures, combined with some alternative capabilities that would enable the device to collect and deliver a sample under the widest possible conditions. The team recommended a likelihood rating of 2, carrying the SSSAS risk as YELLOW (combined rank of 8) until development was further along.

Throughout the PMSR, the review board challenged and debated the project team’s assumptions,

mitigation plans, and risk classifications. When the board was satisfied that it understood the relevant issues thoroughly, the discussion turned to establishing the cost and time reserves that would be available to solve the mission’s problems and risks.

Cost and Time Reserves

The review board established both cost and time reserves for all aspects of the mission based on the difficulty and predictability of each assigned task. The reserves provided a buffer for things that could go wrong with a component while still enabling it to deliver the desired performance within cost and schedule constraints. The cost reserve (also referred to as the “margin”) was a rainy day fund that the engineers could draw on to solve problems for each component in the spacecraft and the scientific instrument package. The time reserve allowed for the inevitable delays caused by project problems that had to be solved and gave engineers some flexibility when they needed it the most, at times of major technical setbacks. The project as a whole had to operate within a strict overall budget constraint and, of course, a time schedule that would conclude during the 21-day launch window when the Earth and Mars orbits aligned. Lee explained the critical role of cost and time reserves:

In 1970, we wrote a proposal to do Viking, the first landing mission on Mars. The spacecraft would put some complex science instruments on the Martian surface. These instruments had been developed in a laboratory, and (obviously) none had ever flown or landed on Mars. The original cost reserve estimate for the three instruments was about $6 million each. They ended up costing more than $50 million each. We can’t do that anymore; if you overran costs by a factor of 10, you would be fired.

The art and science of risk management is knowing right at the start which components are going to need more reserves. In the past, engineers received a blanket 30% reserve on everything and assumed that this would be sufficient to deal with risks as they occurred. But a 30% reserve is insufficient for something brand new that we have never built or flown before.

At the PMSR for a Venus landing project, we discussed a sample arm to be used at the planet’s surface, which has a temperature of 900º F and an extreme environment equivalent to the Judeo-Christian image of hell. I made the engineers put a 75% cost margin on the sample arm which caused the entire project to exceed its budget. They had to go back to the drawing board or else ask for additional funding. They did not like that, and I became persona non grata. But it is absolutely essential that we understand and reserve for risks in advance.

The risk review board typically set margins at 100% for high risk, never-done-before components (such as the Venusian sample arm); at 30% for moderate-likelihood development risks where the engineering was being applied in a new territory, but where the engineers had relevant experience to draw on; and down as low as 10% for low risk, business-as-usual components, which had already flown and functioned on previous missions.

Page 9: Jet Propulsion Laboratory - Portal da Gestão de Riscos · Jet Propulsion Laboratory 110-031 3 authoring four novels with science fiction grandmaster Arthur C. Clarke, collaborating

Jet Propulsion Laboratory 110-031

9

The MBE project team had assigned a 30% time margin to the heat shield to solve the problem of a potential mass increase, but the risk review board recommended raising the margin to 50%. The team had not assigned a special cost reserve since the design decision could be made prior to the fabrication of the flight heat shield, and any increase in mass would not be costly to build.

For the parachute issue, both the project team and the board believed that the problems faced had been addressed in a previous mission’s parachute design. But the project team still recommended a 30% cost margin instead of the normal 15% for components from previous missions. Solving the parachute design issue could require several additional build and test cycles, costing schedule delays and money.

The team and board felt that the radar concerns were business-as-usual risks, and assigned the typical 30% cost and time margins to this component. They increased the cost reserve on the BMSA to a substantial 60% so the project team could fast-track a solution to the instrument design troubles by building four different prototypes to test alternative design approaches. The higher reserve also enabled the project to add a payload system engineer to integrate BMSA design decisions with the design of the rest of the lander. The solar array component would carry a 50% cost margin to respond to any problems that arose in its development. As the meeting concluded, Chris Lewicki noted that some projects never made it past the PMSR because they did not have adequate cost or time reserves to address the risks that had been identified during the PMSR.

Following the PMSR, the project team went forward with the final design of the MBE mission. To maintain focus on risk management, the project team met monthly to review progress in mitigating the risks identified at the PMSR. The project manager and project risk manager met quarterly with the MBE risk review board chairman, updating him on risk mitigation progress. These quarterly meetings also gave the project manager the opportunity to describe any major new risks that had emerged since the previous quarterly discussion.

Critical Design Review

Twenty-two months before launch, the project team scheduled the Critical Design Review (CDR) with the risk review board. At this meeting, the board approved the final design and budget for the mission so that it could proceed with the manufacturing process of the craft and its instruments. In addition to describing the detailed design of the spacecraft elements, the meeting covered the detailed verification and validation that would be performed before launch. The project team, during verification and validation, had to demonstrate that each element could perform its required task, and that the elements together met the mission’s desired scientific objectives. By the time the CDR meeting occurred, the MBE project team had used up about half the project’s funding and about half the total cost reserves in mitigating the risks previously identified.

The Project Leader started the CDR meeting by reviewing the status of the six critical risks from the PMSR.

1 and 2. Heat Shield and Parachute: A combination of design approaches in the parachute and heat shield had consumed some of the mass margin, but detailed analyses and some preliminary testing had shown that the likelihood of failures had been reduced and the consequence of off-nominal performance could be absorbed by the margin in the total design of the system. The review board concurred that these risks had moved into the GREEN zone.

3. Radar: Further study had increased concerns about the likelihood of a problem. The project team had identified a solution to allow the vendor to replace some critical components with

Page 10: Jet Propulsion Laboratory - Portal da Gestão de Riscos · Jet Propulsion Laboratory 110-031 3 authoring four novels with science fiction grandmaster Arthur C. Clarke, collaborating

110-031 Jet Propulsion Laboratory

10

higher-quality parts, but the cost of this solution was significant, and the schedule was tight. The board increased the likelihood ranking from 2 to 3, producing a RED zone risk score of 15.

4. Solar Array Deployment and Power System Performance: The development of the power system and solar array deployment mechanism had proceeded well. Early prototypes subjected to realistic test environments worked very well, and detailed analyses of the power budgets revealed more power margin available than previously thought. This system now had ample margin to absorb future problems, so its risk likelihood was reduced from a 3 to a 1.

5. Bio-Marker Science Analyzer Instrument (BMSA): The BMSA instrument was performing poorly. Three of the four prototypes commissioned after PMSR barely functioned, and the fourth had not worked at all. The likelihood of a problem had skyrocketed to a 5.

The risk review board decided to form a tiger team to address the serious BMSA problems. A tiger team developed solutions to mission-critical problems that the project team could not fix. The tiger team consisted of the best technical experts from within the project, JPL, the risk review board, and generally, from any source in the world. The tiger team initially took a deep dive to learn the exact nature of the problem. Then it developed a mitigation plan to solve it and get the instrument ready for launch. The deployment of tiger teams was expensive in both cost and time, but Lee explained this was the role of the risk reserves:

When a risk review board concludes that a particular item needs a 100% cost and time reserve, it is predicting that the component will need three or four tiger teams along the way to come in and solve some serious problems.

6. Sub-Surface Sample Acquisition System (SSSAS): The project had made great strides in the SSSAS design, but new problems continued to emerge during analysis and testing. The team had so far been able to solve each problem as it arose, but the continued emergence of new problems led the board to keep this risk as YELLOW until the flow of new concerns slowed down.

The project team also told the risk review board about a new risk that first became known a few

weeks before the CDR.

7. Landing Site Safety: An orbiter at Mars had just started to return data on the potential landing sites for the MBE lander, and the engineers were not happy with what they saw. In order to solve the heat shield and parachute problems, the project team had increased the uncertainty ellipse of the landing site area for MBE, and the team was unable to find a site within the expanded area that provided a reasonable probability of a safe landing. The team could present only preliminary findings to the board at the CDR and knew that more analyses had to be done. The consequence of the landing site conditions was not yet catastrophic, but the likelihood of a problem was no longer low. The team assigned a consequence rank of 4 and a likelihood of 2, generating a total risk score of 8 for the landing site issue.

At the conclusion of the CDR, the team summarized the MBE project risks on a revised heat map and updated status chart (see Exhibit 3). Since adequate cost and time reserves still remained to address the YELLOW and RED risks, the MBE project remained on course. The MBE project would only have been canceled or delayed at the CDR meeting if all of the cost reserves had been exhausted while significant unmitigated risks remained.

Page 11: Jet Propulsion Laboratory - Portal da Gestão de Riscos · Jet Propulsion Laboratory 110-031 3 authoring four novels with science fiction grandmaster Arthur C. Clarke, collaborating

Jet Propulsion Laboratory 110-031

11

Critical Events Readiness Review (CERR)

After passing the CDR, the MBE project went forward with manufacturing and testing the spacecraft and instrument package, and continued with monthly and quarterly risk reviews. Seven weeks before scheduled launch, the project engineers appeared at the Critical Events Readiness Review (CERR), the third milestone meeting. This would be the MBE team’s final chance to describe how they had mitigated the critical risks discussed at the CDR, discuss any remaining mission risks, and estimate the overall likelihood of achieving mission success. At the meeting, the review board debated and challenged the project team’s assessment of the residual risk. Certain unresolved risks could be mitigated after launch while the spacecraft was en route to Mars, and others could be mitigated through last-minute changes in software and operational processes. But by the time of the CERR, it was too late to remedy any serious hardware risks. If the remaining risks were deemed unacceptable, the only option was to delay the launch and use the 26 months until the orbits of Earth and Mars aligned again to remedy the problems.

With the full weight of these realities, the review board considered the status of the two critical remaining risks (see Exhibit 4). The tiger team and the project engineers had solved most of the BMSA instrument issues, but new annoying problems continued to crop up. The team had developed a variety of workarounds and alternative approaches that would allow a partial return of science in most cases of failure. The progress, however, was still not enough to retire the risk entirely. The current assessment was that the consequence of the risks had been reduced to a 3 but the likelihood of significant problems was still a 3. Overall the BMSA instruments remained a YELLOW risk.

The project leader noted that some newer technologies had evolved since the project had started four years ago. If the launch were to be delayed, the team could develop, test, and, if successful, substitute new components in the BMSA instrument package. The new components would not have flown in space before so a risk of instrument failure would remain, but the likelihood would be greatly reduced.

The project leader then presented the analysis of landing site risks (#7). Further study of data from the Mars Orbiter had increased the concerns about the spacecraft’s ability to land safely. The current best estimate, based on a variety of modeling and simulation techniques, was a 20% likelihood that the lander would not land successfully due to the unpredictability of the landing site terrain. The project team had assigned a likelihood score of 3, and a consequence of 5, which produced a RED zone risk score of 15.

The project leader noted, however, that a new spacecraft had just arrived and successfully gone into orbit around Mars. Within several months, the new orbiter would begin to send pictures back that would have a resolution 3 to 4 times greater than the current images of potential MBE landing sites. The team expected that the new images would enable them to find a suitable landing site that would reduce the likelihood of landing-site failure below 5%, perhaps to as low as 1%–2%.

After the discussion of the two critical remaining risks, Lewicki commented on the trade-offs faced by the risk review board:

All the risk management we do does not eliminate risk, and often we trade one risk against another to get to an acceptable level. We certainly launch missions that can fail. We balance the hope that the mission just might work out versus the benefits and costs from delaying two or more years. Delay gives us time to mitigate the remaining risks but at much higher project cost.

Page 12: Jet Propulsion Laboratory - Portal da Gestão de Riscos · Jet Propulsion Laboratory 110-031 3 authoring four novels with science fiction grandmaster Arthur C. Clarke, collaborating

110-031 Jet Propulsion Laboratory

12

The project team and risk review board then spent several hours assessing the mission consequences from the two critical risks, along with some remaining quirks of the radar and some residual questions about the heat shield and parachute performance. They concluded that the overall probability of success of the EDL stage was currently about 80%. Lee felt that the risk review board was generally comfortable if the first digit in the likelihood of success was a 9. An 8 generated considerable discussion and linked back to the review board’s risk appetite, which varied by type of mission. On flagship missions costing more than $2 billion, the board wanted about a 96% probability of success before it would recommend a launch. For a discovery mission, with a cost of about $0.5 billion, 90% could be acceptable. On a lower-cost mission or where the next launch opportunity would be four or more years out, the limit might drop to 70%. Lee described the consequences of delay:

Launch delay can add between 20%–40% to the cost of the entire project as JPL and its contractors maintain personnel to reduce mission risks, and perform the necessary re-analyses to align the mission design with the later launch opportunity. The extra money comes from other JPL missions in process, reducing the science potentially available from them.

Also, JPL and NASA have heavily promoted the exciting science from the MBE mission. Any delay would severely diminish their reputation as well as that of its contractors. The delay would significantly erode taxpayer and Congressional support for future projects if they concluded that the MBE money had not been well spent. Finally, many of our projects in the pipeline build on each other, so delay creates an adverse effect on the science basis for our downstream projects. But against all these costs from delaying the launch is the catastrophic fallout should the mission fail.

As the risk review board neared the end of the three-day Critical Events Readiness Review meeting, Lee wondered where the board should come out. Should it recommend going forward with the launch? Or should it delay and give the project team two more years to resolve the MBSA instrument package and landing site issues?

Page 13: Jet Propulsion Laboratory - Portal da Gestão de Riscos · Jet Propulsion Laboratory 110-031 3 authoring four novels with science fiction grandmaster Arthur C. Clarke, collaborating

110-

031

-1

3-

Exh

ibit

1A

rtis

t’s R

end

itio

n of

the

Ent

ry-D

esce

nt-L

and

ing

(ED

L) S

tage

Sour

ce:

Cou

rtes

y of

NA

SA/

JPL

.

Page 14: Jet Propulsion Laboratory - Portal da Gestão de Riscos · Jet Propulsion Laboratory 110-031 3 authoring four novels with science fiction grandmaster Arthur C. Clarke, collaborating

110-031

14

Exhibit 2

Source: Cas

Lik

elih

ood

Heat Map

sewriters, based

at PMSR Rev

on internal comp

view Meeting

pany documentss.

Jet Propu

ulsion Laboratorry

Page 15: Jet Propulsion Laboratory - Portal da Gestão de Riscos · Jet Propulsion Laboratory 110-031 3 authoring four novels with science fiction grandmaster Arthur C. Clarke, collaborating

110-

031

-1

5-

Exh

ibit

3H

eat M

ap a

t CD

R M

eeti

ng

Cri

tical

ityTr

end

App

roac

h

Dec

reas

ing

(Impr

ovin

g)M

-M

itiga

te

W -

Wat

ch

A -

Acc

ept

R -R

esea

rch

Incr

easi

ng (W

orse

ning

)

Unc

hang

ed

New

sin

ce la

st M

onth

Hig

h

Med

Low

Ran

k &

Tren

d

Ris

k

IDA

ppro

ach

Ris

k Ti

tle

15

MB

io-M

arke

r Det

ectio

n C

apab

ility

23

MR

AD

AR

Rel

iabi

lity

37

RLa

ndin

g si

te s

urvi

vabi

lity

46

WS

ub-s

urfa

ce S

ampl

e A

cqui

sitio

n

54

WP

ower

sys

tem

per

form

ance

61

AH

eat S

hiel

d pe

rfor

man

ce

72

WP

arac

hute

per

form

ance

5 4 3 2 1

12

34

5

Liklihood

Con

sequ

ence1 2

4

3

6 75

Sour

ce:

Cas

ewri

ters

, bas

ed o

n in

tern

al c

omp

any

doc

um

ents

.

Likelihood

Page 16: Jet Propulsion Laboratory - Portal da Gestão de Riscos · Jet Propulsion Laboratory 110-031 3 authoring four novels with science fiction grandmaster Arthur C. Clarke, collaborating

110-

031

-1

6-

Exh

ibit

4H

eat M

ap a

t the

End

of t

he C

ERR

Mee

ting

Cri

tical

ityTr

end

App

roac

h

Dec

reas

ing

(Impr

ovin

g)M

-M

itiga

te

W -

Wat

ch

A -

Acc

ept

R -R

esea

rch

Incr

easi

ng (

Wor

seni

ng)

Unc

hang

ed

New

sin

ce la

st M

onth

Hig

h

Med

Low

Ran

k &

Tren

d

Ris

k

IDA

ppro

ach

Ris

k T

itle

17

RLa

ndin

g si

te s

urvi

vabi

lity

25

WB

io-M

arke

r Det

ectio

n C

apab

ility

36

WS

ub-

surf

ace

Sam

ple

Acq

uisi

tion

43

AR

AD

AR

Rel

iab

ility

54

AP

ow

er s

yste

m p

erfo

rman

ce

61

AH

eat S

hiel

d pe

rfor

man

ce

72

AP

arac

hute

per

form

ance

5 4 3 2 1

12

34

5

Liklihood

Con

sequ

ence

3 6

1 2

4

57

Sour

ce:

Cas

ewri

ters

, bas

ed o

n in

tern

al c

omp

any

doc

um

ents

.

Likelihood