Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Professothis caseCases ar Copyrigwrite Haphotoco
R O B E R
A N E T T
Jet
GedecisiBiologof $60decidwhen
Hist
ThoperaAeronfull-ti
A foundsponsand othat le1958, thereathe RmannMarinthe VNASAof its
NAschedJanuaSpacemissiogroun
ors Robert S. Kaplae is fictional and bre not intended to s
ght © 2010 Presidenarvard Business Sc
opied, or otherwise
T S . K A P L A N
T E M I K E S
Propul
entry Lee, senion faced bygical Explorer00 million doed the remai
n the planets a
tory and M
he Jet Propulated by Califonautics and Sime CalTech e
group of Cded JPL durisorship of theon February 1ed to the discthe U.S. con
after, became Ranger and Suned Apollo lanner spacecraft
Voyager missiA’s Hubble Spinter-planetar
ASA, howeveduled launch ary 1986, the e Shuttle Chalons but had ind controllers
an and Anette Mikeased on the generaserve as endorseme
nt and Fellows of Hchool Publishing, Bo
reproduced, poste
lsion L
nior systems the risk revr mission. Aftollars, howevining risks weagain re-align
Mission
lsion Laboraornia Institut
Space Adminemployees an
alTech graduing World We U.S. Army. 1, 1958, it helcovery of thensolidated itsNASA’s prim
urveyor robondings. JPL lat to Venus, Mions to Jupitepace Telescopry robotic mis
er, had also eof Apollo 1, thseven crew m
llenger broke aits most visibls in 1993. So
es prepared this caal experiences of Lents, sources of prim
Harvard College. Toston, MA 02163, o
ed, or transmitted, w
Laborat
The only t
engineer at tview board ster a develop
ver, significanere too high,
ned. Lee pond
tory (JPL) wte of Techno
nistration), thend managed s
uate studentsWar II to dev
During the 1ped launch E
e Van Allen rs various spmary planetarotic spacecrafaunched succ
Mars, and Meer, Saturn, Urpe and operassions.
experienced sehree astronaumembers, incapart 73 secole failure whe
ome described
ase with the assistaLee and Lewicki. Hmary data, or illust
To order copies or or go to www.hbspwithout the permis
ory
hing worse tha
the Jet Propuseven weeks ment period
nt mission-thr the next lau
dered whether
was a researclogy (CalTece U.S. space several thousa
s and their velop and te1950s, JPL carExplorer 1, Aadiation beltsace programry spacecraft ft missions tocessful interpercury, the Garanus, and Nted the Deep
everal tragic uts died whencluding schooonds after lauen the Mars Od this $1 billi
ance of Gentry Lee HBS cases are devel
trations of effective
request permissionp.harvard.edu/edussion of Harvard Bu
an a delay is a
lsion Laboratbefore the s
of more thanreatening risk
unch opportunr to recomme
ch and develch) under a cagency. JPL
and contracto
adviser, Profest rockets anrried out thremerica’s firsts high above
ms into a newcenter. JPL eno the Moon
planetary explalileo mission
Neptune. JPL Space Netwo
failures. In Jan a fire erupt
ol teacher Chrnch. JPL was
Observer, launion project as
and Chris Lewickiloped solely as thee or ineffective man
n to reproduce matcators. This publicausiness School.
mission that fa
tory, contempscheduled la four years anks still remainity would bnd launch or
lopment centcontract fromemployed ap
ors.
fessor Theodnd guided mee successful t satellite, wh
the Earth’s sw agency, NAngineers desigthat preparedloration missin to Jupiter aalso developork for comm
anuary 1967, ted in a grourista McAulif
s not involvednched in 1992
s “a huge am
9-110-R E V : M A Y 2 7
i. The mission desce basis for class disnagement.
terials, call 1-800-5ation may not be d
ails. — Gentr
plated the difunch of the nd the expendined. If the bbe 26 months
delay.
ter, managedm NASA (Napproximately
dore von Karmissiles unde
sub-orbital fhich sent backsurface. In OcASA. JPL, shgned and oped the way foions, includin
and its moonsped the camermunication wi
months befornd-test capsuffe, died whed in either of 2, lost contact
mount of taxpa
-0317 , 2 0 1 0
cribed in scussion.
45-7685, igitized,
ry Lee
fficult Mars
diture board later,
d and tional 5,000
rman, er the flights k data ctober hortly erated or the ng the s, and ra for ith all
re the ule. In en the those
t with ayers’
110-031 Jet Propulsion Laboratory
2
money spent for nothing.” In the early 1990s, the political and public mood demanded reforms to the space program, which led to the appointment (in 1992) of Daniel Goldin as the new NASA administrator. Goldin, formerly an executive at aerospace contractor TRW, believed that new management techniques and technologies, along with accepting more risk, would dramatically reduce the cost of NASA’s missions. In a 1992 speech, he challenged JPL to adopt “faster, better, cheaper” techniques so that it could do more without spending more money. He asserted:
Be bold—take risks. [A] project that’s 20 for 20 isn’t successful. It’s proof that we’re playing it too safe. If the gain is great, risk is warranted. Failure is OK, as long as it’s on a project that’s pushing the frontiers of technology.1
But the new strategy did not reverse the incidence of major failures. The Mars Climate Orbiter disappeared during orbit insertion on Sept. 23, 1999, due to a navigation error; analyses had been performed and communicated using English units (feet and pounds) rather than NASA-mandated metric units (meters and kilograms). The Mars Polar Lander disappeared as it neared the surface of Mars in December 1999. To save money, the Lander did not have telemetry during its descent to Mars, and subsequent analysis suggested that the failure was probably due to a software fault that shut off the descent rocket too early, causing the spacecraft to fall the last 40 meters onto the surface. These two failures ended the “faster, better, cheaper” management philosophy for Mars Landers.
NASA’s manned space program experienced another tragic failure with the loss, on February 1, 2003, of Space Shuttle Columbia and its seven crew members 16 minutes before scheduled touchdown. The Columbia Accident Investigation Board concluded that the accident was not an anomalous random event, but rather
. . . rooted in NASA’s Space Shuttle history and culture, including the original compromises that were required to gain approval for the Shuttle, subsequent years of resource constraints, fluctuating priorities, schedule pressures . . . and lack of an agreed national vision for human space flight. Cultural traits and organizational practices detrimental to safety [included] reliance on past success as a substitute for sound engineering practices; organizational barriers that prevented effective communication of critical safety information and stifled professional differences of opinion; lack of integrated management across program elements; and the evolution of an informal chain of command and decision-making processes that operated outside the organization’s rules.2
Implementing a New Risk Management Culture at JPL
In 2000, NASA’s new Mars Program Director, Scott Hubbard, asked Gentry Lee, a former JPL employee, to return and help develop an architecture for a new Mars Mission Program. Hubbard wanted the architecture to include a risk management program that would significantly increase JPL’s mission success rate. Lee, a graduate of the University of Texas and MIT, had worked with JPL from 1969 to 1976 as part of the Viking project team that engineered the first successful landing of a spacecraft on Mars. Lee subsequently became chief engineer of the Galileo project, which over its 10- year mission, explored Jupiter with both an atmospheric probe and an orbiter that mapped the planet’s major satellites. Galileo was the last of the grand-scale missions before NASA’s “faster, better, cheaper” era. Lee left JPL during this era to pursue various other activities including co-
1 Daniel Goldin, transcript of remarks and discussion at the 108th Space Studies board meeting, Irvine, Calif., November 18, 1992; Daniel Goldin, “Toward the Next Millennium: A Vision for Spaceship Earth,” speech delivered at the World Space Congress, September 2, 1992. 2 Executive Summary, Columbia Accident Investigation Board Report Volume 1: 9 (August 2003).
Jet Propulsion Laboratory 110-031
3
authoring four novels with science fiction grandmaster Arthur C. Clarke, collaborating with Carl Sagan on an award-winning science documentary series for television, designing computer games, writing columns, and lecturing on space exploration and extra-terrestrial life.
Lee accepted Hubbard’s offer and in 2002 became JPL’s chief systems engineer, with responsibility for the engineering integrity of all JPL planetary missions. Lee defined his role as “minister without portfolio, the person who made sure everything worked the way it was supposed to on a global scale.” He described how he thought about mission risks:
At the start of a project, try to write down everything you can that is risky. Then put together a plan for each of those risks, and watch how the plan evolves. Some risks are “business as usual risks.” We are familiar with these risks and know how to quantify and mitigate them. Others are “development risks,” in which the project’s engineering enters territory we have never experienced before. And, finally, we have risks imposed by the environment that we can’t control, which we call the “unknown unknowns.” We attempt to quantify all the risks of each type and aggregate them into an approximate likelihood of mission success. The final question we face is, “do we launch or not?” How large does failure have to loom before you decide to cancel or delay a project in which the project people have worked for years and taxpayers have invested hundreds of millions of dollars in the hope of producing important new scientific knowledge?
Lee believed that “risk mitigation was painful; not a natural event for humans to perform,” and that overcoming cultural resistance would be his largest challenge. He explained:
JPL engineers graduate from top schools at the top of their class. They are used to being right in their design and engineering decisions. I have to get them comfortable thinking about all the things that can go wrong. This requires accepting a culture of intellectual confrontation.
Peoples’ ambitions and careers get wrapped up in being right all the time. They have to learn that it’s not important whether your initial idea is right. It’s important whether or not the idea we go forward with is right. And that’s what intellectual confrontation helps us achieve.
JPL already had an existing risk assurance process but project engineers typically viewed it as peripheral to their work, something they had to do just before milestone reviews. Lee wanted risk management to become embedded within the engineering process so that it would be continually front-of-mind during a project’s life.
Referring to Janus, the two-faced Roman god of gates and doors who looked forward and backward, Lee remarked, “Innovation, looking forward, is absolutely essential, but innovation needs to be balanced with reflecting backwards, learning from experience about what can go wrong.”
Source: Wikipedia, http://en.wikipedia.org/wiki/File: Janus-Vatican.JPG, accessed February 2010. Reprinted with permission under the terms of the GNU Free Documentation License.
110-031 Jet Propulsion Laboratory
4
Over the next six years, Lee helped introduce a comprehensive system for managing the risks of planetary missions. While early elements of the system had been used in JPL-managed projects during this time, the Mars Biological Explorer (MBE) program was the first that used the system from initial project formulation all the way through launch.3 Recent missions to Mars had confirmed the presence of iced water in the North Polar Region and the widespread existence of salts and minerals elsewhere that could have been formed with water. NASA scientists now believed that the water could have supported life. The $745 million MBE mission would send a non-mobile platform to Mars to acquire and analyze subsurface samples for bio-markers that would indicate the presence of recent or active life-processes under the surface of Mars.
MBE, like any planetary landing program, consisted of four principal stages: launch, cruise, entry-descent-landing, and surface operations. Each stage had its own challenges and project team. An engineer on the cruise stage remarked that guiding a spacecraft from Earth to a specific location on Mars was comparable to shooting a baseball from the pitching mound of Dodger Stadium in Los Angeles to cross the outside corner of the plate in Wrigley Field in Chicago. But, she pointed out, the spacecraft cruise was harder because Earth and Mars were both moving relative to each other in their own orbits around the sun while simultaneously spinning on their axes. Lee felt, however, that the laws of planetary motion made the cruise-stage risks “known unknowns,” whereas landing the spacecraft safely on Mars during the entry-descent-landing (EDL) stage faced “unknown unknowns.” The spacecraft would arrive at the atmosphere of Mars traveling at 12,000 miles per hour, 20 times faster than a speeding bullet, and, within a few minutes, had to decelerate and land safely on a surface of unpredictable composition and slope (see Exhibit 1). The engineers in the Pasadena control room described the EDL stage, during which they could not communicate with the spacecraft, as their “six minutes of terror.”
Risk Review Board
The MBE project had a 12-person risk review board, chaired by Lee, consisting of experienced and respected technical experts from JPL, NASA management, and the project’s prime contractor. The members of the board were independent of the project and had been chosen based on their ability to bring knowledge and expertise to it. The experts served on the risk review board because they could make an important contribution to mission success, even without being directly involved in the project.
The review board created the culture of intellectual confrontation during three critical review meetings during the project. At each of the three-day meetings, the board played devil’s advocate, questioning and challenging project engineers about their assumptions for how the mission would work. Lee described the role of the risk review board:
Often project people have bet their careers on a mission, and have become comfortable making assumptions about the parameters they need to design for. The risk review board is an independent group who are empowered to ask about the bad things that can happen to good designs. What if the parameter changed from 12 to 16? We don’t want just technical expertise on a risk review board. We need a certain type of personality, people with a lot of self-confidence who are willing to speak out and challenge. We want them to be paranoid, constantly worrying about what can go wrong.
3 The Mars Biological Explorer is a fictional composite of several Mars landing missions. The issues described for the MBE project are based on situations that occurred during actual Mars landing projects.
Jet Propulsion Laboratory 110-031
5
Risk review meetings were highly interactive, challenging, and intense. Systems engineer Chris Lewicki, an MBE review board member, described the culture at the meetings:
We tear each other apart in a review, throwing stones and giving very critical commentary about everything that’s going on. The risk review process gives the project engineers an opportunity to see their work from another perspective. It lifts their noses away from the grindstone. For the past year, they have been focused on how some component worked and became personally invested in it. Now they find out it’s either much more important than they had perceived or, occasionally, insignificant in the context of everything else going on. It’s rarely exactly what they thought it was.
Lee concurred: “Engineers have a forest and a tree problem. They may spend 75% of their time worrying about things like polishing a cannonball, which have little impact on mission success. Only 25% of their time is spent on risks that could cause mission failure.”
Preliminary Mission and Systems Review
The MBE review board’s first meeting, the Preliminary Mission and Systems Review (PMSR), occurred 51 months before the targeted launch date. The project team that attended the PMSR included the Project Manager, Mission Assurance Manager, Project Scientist, Mission Manager, and Project Systems Engineer. First they described the mission to the risk review board, and then the science necessary to accomplish it, the instrumentation necessary to perform the science, and the operations plan to make sure it would all happen. They highlighted six critical risks during the EDL phase of the mission that had to be addressed. The risks had been categorized as either “implementation risks,” which posed a challenge to completing the project in time for the launch of the spacecraft, or mission risks, which could arise during the mission itself.
The MBE project team had identified the critical risks through a project risk assessment process that classified each possible mission or implementation risk along two dimensions: the consequences if the risk occurred, and the likelihood of a risk occurrence. The team used the scales shown below to classify the risks:
Mission Consequence of Occurrence
Score Consequence Definition
5 Very High Mission failure
4 High Significant (75%) degradation in mission benefits
3 Moderate Moderate (50%) degradation in mission benefits
2 Low Small (25%) degradation in mission benefits
1 Very Low Minimal (or no) degradation in mission benefits
Implementation Consequence of Occurrence
Score Consequence Definition
5 Very High Overran budget or contingency; unable to launch with current resources
4 High Consumed all budget, schedule or margin
3 Moderate Significant reduction in margin or launch date slack
2 Low Small reduction in margin or launch date slack
1 Very Low Minimal reduction in margin or launch date slack
110-031 Jet Propulsion Laboratory
6
The team assessed the risk’s likelihood of occurrence based on experience, an estimate inferred from a statistical sample, or (lacking either of these) an educated guess.
Likelihood of Occurrence
Score Consequence Definition
5 Very High Almost certain (>70%)
4 High More likely than not (>50%)
3 Moderate Significant (>30%)
2 Low Unlikely (>5%)
1 Very Low Very unlikely (<5%)
The team displayed the criticality of each risk on a two-dimensional “heat map” (see Exhibit 2). Each cell represented the product of the risk’s likelihood and its consequence. A risk with a very high probability (5) but a very low consequence (1) had a low total score (5×1 = 5), in the GREEN zone. A moderate risk (3) with a moderate likelihood (3) had a total score of 9.
At the PMSR, the project team presented all identified risks to the risk review board but spent most of the meeting discussing the six critical risks that had scored in the YELLOW category (overall score 6–12). (No risk was yet in the RED zone (15 and higher).)
1. The Heat Shield performed the first task during EDL. As the spacecraft entered the Martian
atmosphere, the heat shield had to protect the spacecraft from burning up while removing 99% of the kinetic energy from its high-velocity interplanetary trajectory. Since the MBE mission was using a larger spacecraft than previous Martian missions and would approach the Martian atmosphere at a higher entry velocity, its heat shield had to absorb and dissipate a much higher heat load than in previous missions. The project was exploring increasing the thickness of the heat shield as the primary option, which would significantly increase its mass and thus stress other areas of the spacecraft design. An alternative was to modify the trajectory to provide a different heating profile, but this came at the expense of the performance margin in other areas of EDL, while also increasing the uncertainty in the size of the ellipse in which the MBE was predicted to land.
Refining the heat shield design did not entail technical or cost risks, but any higher mass had to be offset by other design decisions. The likelihood that the environment would exceed the heat shield’s capability was low (10%–30%, or rank 2), but the mission consequence of failure would be catastrophic (rank 5), leading to a total score of 10.
2. The Parachute would deploy once the spacecraft had slowed to 1,000 mph and was the next
most important device in slowing the MBE’s approach to landing. The parachutes from past missions were not large enough to slow the MBE to a safe terminal descent velocity. These parachutes, however, already completely filled the canister, so a larger parachute would mean either a larger canister or a lighter material. A change in the trajectory could provide the parachute with more time to perform its task, but doing so would increase the heat load on the already-stressed heat shield. The project team thought the problem could be solved by some alternative design choices without increasing size or weight.
Jet Propulsion Laboratory 110-031
7
The mission consequence of an underperforming parachute was severe (rank 5), but again, the likelihood was perceived to be low (rank 2), yielding a combined score of 10.
3. The onboard Radar had to sense when the craft, dangling from its parachute, was within 15 seconds of hitting the surface, and then fire the retro-rockets to slow and guide the spacecraft from 150 mph to a gentle landing on a level surface. The radar, which operated for only two minutes, must start up and operate after nine months of hibernation on its voyage through deep space. The current radar was a modified version of a military fighter-jet radar, and engineers were unsure that it could perform its task perfectly in the space environment.
The mission consequences of failure were catastrophic, hence the rank of 5. The engineers had yet to reach a consensus on the likelihood of the problem manifesting itself. The radar was being used on other missions, but these were still en route to their distant destinations, and the few available data points did not build sufficient confidence. Some experts, including the radar vendor, argued that it should be classified as a very low risk (1), but the JPL experts believed it should be classified as a YELLOW risk for increased visibility until the problem was better understood. The project team and board agreed on a placeholder likelihood of 2.
4. The MBE cruise was powered by Solar Arrays that would be jettisoned before the EDL
phase. After the spacecraft landed on Mars, it must autonomously deploy new solar arrays and reconfigure the power system for continued survival on the surface of Mars. A new power system and improved solar array technology had been selected for this mission, but only limited development and testing had been performed to date. The array’s mechanism was new territory for the development team and while nothing specific had arisen as a source of problems, the engineers expected surprises along the way. There would be two devices on board so the failure of one would not end the mission, but would substantially degrade its capability. They rated the development risk as YELLOW, with a consequence rank of 4, and a likelihood of 3. The team expected the consequence ranking to remain a 4 throughout the development cycle, but hoped to drive the likelihood to a 1.
5. The Bio-Marker Science Analyzer Instrument (BMSA) was the centerpiece of the MBE
mission. The BMSA’s sensors and processes performed the primary activity of the science mission. The technology for the bio-marker sensors had never been flown in space before, and its packaging into the craft required a level of miniaturization more advanced than in any previous space science instrument.
If the BMSA instrument failed to perform, the science benefits from the project would be substantially diminished. The team assigned a consequence risk of 4. A lively discussion ensued about the progress and work required to get the instrument ready for launch. One review board member argued that the instrument could not be finished in time for launch and wanted it declared a RED risk, but the majority of the board argued for a high YELLOW risk (3).
6. The BMSA instrument needed a sub-surface sample acquisition system (SSSAS) to collect
soil, process it to the required consistency, and deliver a minimum volume through one of the BMSA inlet ports. It was impossible to predict the conditions of the soil before the mission. If this acquisition system did not perform its required task perfectly, the complex BMSA instrument could not do any analysis.
110-031 Jet Propulsion Laboratory
8
The development risks of this device were typical, but the mission risk they posed were identical to non-performance of the BMSA, a consequence of 4. The exact soil conditions at the landing site would be mission “unknown unknowns,” not knowable in advance. The team was developing some operational contingency procedures, combined with some alternative capabilities that would enable the device to collect and deliver a sample under the widest possible conditions. The team recommended a likelihood rating of 2, carrying the SSSAS risk as YELLOW (combined rank of 8) until development was further along.
Throughout the PMSR, the review board challenged and debated the project team’s assumptions,
mitigation plans, and risk classifications. When the board was satisfied that it understood the relevant issues thoroughly, the discussion turned to establishing the cost and time reserves that would be available to solve the mission’s problems and risks.
Cost and Time Reserves
The review board established both cost and time reserves for all aspects of the mission based on the difficulty and predictability of each assigned task. The reserves provided a buffer for things that could go wrong with a component while still enabling it to deliver the desired performance within cost and schedule constraints. The cost reserve (also referred to as the “margin”) was a rainy day fund that the engineers could draw on to solve problems for each component in the spacecraft and the scientific instrument package. The time reserve allowed for the inevitable delays caused by project problems that had to be solved and gave engineers some flexibility when they needed it the most, at times of major technical setbacks. The project as a whole had to operate within a strict overall budget constraint and, of course, a time schedule that would conclude during the 21-day launch window when the Earth and Mars orbits aligned. Lee explained the critical role of cost and time reserves:
In 1970, we wrote a proposal to do Viking, the first landing mission on Mars. The spacecraft would put some complex science instruments on the Martian surface. These instruments had been developed in a laboratory, and (obviously) none had ever flown or landed on Mars. The original cost reserve estimate for the three instruments was about $6 million each. They ended up costing more than $50 million each. We can’t do that anymore; if you overran costs by a factor of 10, you would be fired.
The art and science of risk management is knowing right at the start which components are going to need more reserves. In the past, engineers received a blanket 30% reserve on everything and assumed that this would be sufficient to deal with risks as they occurred. But a 30% reserve is insufficient for something brand new that we have never built or flown before.
At the PMSR for a Venus landing project, we discussed a sample arm to be used at the planet’s surface, which has a temperature of 900º F and an extreme environment equivalent to the Judeo-Christian image of hell. I made the engineers put a 75% cost margin on the sample arm which caused the entire project to exceed its budget. They had to go back to the drawing board or else ask for additional funding. They did not like that, and I became persona non grata. But it is absolutely essential that we understand and reserve for risks in advance.
The risk review board typically set margins at 100% for high risk, never-done-before components (such as the Venusian sample arm); at 30% for moderate-likelihood development risks where the engineering was being applied in a new territory, but where the engineers had relevant experience to draw on; and down as low as 10% for low risk, business-as-usual components, which had already flown and functioned on previous missions.
Jet Propulsion Laboratory 110-031
9
The MBE project team had assigned a 30% time margin to the heat shield to solve the problem of a potential mass increase, but the risk review board recommended raising the margin to 50%. The team had not assigned a special cost reserve since the design decision could be made prior to the fabrication of the flight heat shield, and any increase in mass would not be costly to build.
For the parachute issue, both the project team and the board believed that the problems faced had been addressed in a previous mission’s parachute design. But the project team still recommended a 30% cost margin instead of the normal 15% for components from previous missions. Solving the parachute design issue could require several additional build and test cycles, costing schedule delays and money.
The team and board felt that the radar concerns were business-as-usual risks, and assigned the typical 30% cost and time margins to this component. They increased the cost reserve on the BMSA to a substantial 60% so the project team could fast-track a solution to the instrument design troubles by building four different prototypes to test alternative design approaches. The higher reserve also enabled the project to add a payload system engineer to integrate BMSA design decisions with the design of the rest of the lander. The solar array component would carry a 50% cost margin to respond to any problems that arose in its development. As the meeting concluded, Chris Lewicki noted that some projects never made it past the PMSR because they did not have adequate cost or time reserves to address the risks that had been identified during the PMSR.
Following the PMSR, the project team went forward with the final design of the MBE mission. To maintain focus on risk management, the project team met monthly to review progress in mitigating the risks identified at the PMSR. The project manager and project risk manager met quarterly with the MBE risk review board chairman, updating him on risk mitigation progress. These quarterly meetings also gave the project manager the opportunity to describe any major new risks that had emerged since the previous quarterly discussion.
Critical Design Review
Twenty-two months before launch, the project team scheduled the Critical Design Review (CDR) with the risk review board. At this meeting, the board approved the final design and budget for the mission so that it could proceed with the manufacturing process of the craft and its instruments. In addition to describing the detailed design of the spacecraft elements, the meeting covered the detailed verification and validation that would be performed before launch. The project team, during verification and validation, had to demonstrate that each element could perform its required task, and that the elements together met the mission’s desired scientific objectives. By the time the CDR meeting occurred, the MBE project team had used up about half the project’s funding and about half the total cost reserves in mitigating the risks previously identified.
The Project Leader started the CDR meeting by reviewing the status of the six critical risks from the PMSR.
1 and 2. Heat Shield and Parachute: A combination of design approaches in the parachute and heat shield had consumed some of the mass margin, but detailed analyses and some preliminary testing had shown that the likelihood of failures had been reduced and the consequence of off-nominal performance could be absorbed by the margin in the total design of the system. The review board concurred that these risks had moved into the GREEN zone.
3. Radar: Further study had increased concerns about the likelihood of a problem. The project team had identified a solution to allow the vendor to replace some critical components with
110-031 Jet Propulsion Laboratory
10
higher-quality parts, but the cost of this solution was significant, and the schedule was tight. The board increased the likelihood ranking from 2 to 3, producing a RED zone risk score of 15.
4. Solar Array Deployment and Power System Performance: The development of the power system and solar array deployment mechanism had proceeded well. Early prototypes subjected to realistic test environments worked very well, and detailed analyses of the power budgets revealed more power margin available than previously thought. This system now had ample margin to absorb future problems, so its risk likelihood was reduced from a 3 to a 1.
5. Bio-Marker Science Analyzer Instrument (BMSA): The BMSA instrument was performing poorly. Three of the four prototypes commissioned after PMSR barely functioned, and the fourth had not worked at all. The likelihood of a problem had skyrocketed to a 5.
The risk review board decided to form a tiger team to address the serious BMSA problems. A tiger team developed solutions to mission-critical problems that the project team could not fix. The tiger team consisted of the best technical experts from within the project, JPL, the risk review board, and generally, from any source in the world. The tiger team initially took a deep dive to learn the exact nature of the problem. Then it developed a mitigation plan to solve it and get the instrument ready for launch. The deployment of tiger teams was expensive in both cost and time, but Lee explained this was the role of the risk reserves:
When a risk review board concludes that a particular item needs a 100% cost and time reserve, it is predicting that the component will need three or four tiger teams along the way to come in and solve some serious problems.
6. Sub-Surface Sample Acquisition System (SSSAS): The project had made great strides in the SSSAS design, but new problems continued to emerge during analysis and testing. The team had so far been able to solve each problem as it arose, but the continued emergence of new problems led the board to keep this risk as YELLOW until the flow of new concerns slowed down.
The project team also told the risk review board about a new risk that first became known a few
weeks before the CDR.
7. Landing Site Safety: An orbiter at Mars had just started to return data on the potential landing sites for the MBE lander, and the engineers were not happy with what they saw. In order to solve the heat shield and parachute problems, the project team had increased the uncertainty ellipse of the landing site area for MBE, and the team was unable to find a site within the expanded area that provided a reasonable probability of a safe landing. The team could present only preliminary findings to the board at the CDR and knew that more analyses had to be done. The consequence of the landing site conditions was not yet catastrophic, but the likelihood of a problem was no longer low. The team assigned a consequence rank of 4 and a likelihood of 2, generating a total risk score of 8 for the landing site issue.
At the conclusion of the CDR, the team summarized the MBE project risks on a revised heat map and updated status chart (see Exhibit 3). Since adequate cost and time reserves still remained to address the YELLOW and RED risks, the MBE project remained on course. The MBE project would only have been canceled or delayed at the CDR meeting if all of the cost reserves had been exhausted while significant unmitigated risks remained.
Jet Propulsion Laboratory 110-031
11
Critical Events Readiness Review (CERR)
After passing the CDR, the MBE project went forward with manufacturing and testing the spacecraft and instrument package, and continued with monthly and quarterly risk reviews. Seven weeks before scheduled launch, the project engineers appeared at the Critical Events Readiness Review (CERR), the third milestone meeting. This would be the MBE team’s final chance to describe how they had mitigated the critical risks discussed at the CDR, discuss any remaining mission risks, and estimate the overall likelihood of achieving mission success. At the meeting, the review board debated and challenged the project team’s assessment of the residual risk. Certain unresolved risks could be mitigated after launch while the spacecraft was en route to Mars, and others could be mitigated through last-minute changes in software and operational processes. But by the time of the CERR, it was too late to remedy any serious hardware risks. If the remaining risks were deemed unacceptable, the only option was to delay the launch and use the 26 months until the orbits of Earth and Mars aligned again to remedy the problems.
With the full weight of these realities, the review board considered the status of the two critical remaining risks (see Exhibit 4). The tiger team and the project engineers had solved most of the BMSA instrument issues, but new annoying problems continued to crop up. The team had developed a variety of workarounds and alternative approaches that would allow a partial return of science in most cases of failure. The progress, however, was still not enough to retire the risk entirely. The current assessment was that the consequence of the risks had been reduced to a 3 but the likelihood of significant problems was still a 3. Overall the BMSA instruments remained a YELLOW risk.
The project leader noted that some newer technologies had evolved since the project had started four years ago. If the launch were to be delayed, the team could develop, test, and, if successful, substitute new components in the BMSA instrument package. The new components would not have flown in space before so a risk of instrument failure would remain, but the likelihood would be greatly reduced.
The project leader then presented the analysis of landing site risks (#7). Further study of data from the Mars Orbiter had increased the concerns about the spacecraft’s ability to land safely. The current best estimate, based on a variety of modeling and simulation techniques, was a 20% likelihood that the lander would not land successfully due to the unpredictability of the landing site terrain. The project team had assigned a likelihood score of 3, and a consequence of 5, which produced a RED zone risk score of 15.
The project leader noted, however, that a new spacecraft had just arrived and successfully gone into orbit around Mars. Within several months, the new orbiter would begin to send pictures back that would have a resolution 3 to 4 times greater than the current images of potential MBE landing sites. The team expected that the new images would enable them to find a suitable landing site that would reduce the likelihood of landing-site failure below 5%, perhaps to as low as 1%–2%.
After the discussion of the two critical remaining risks, Lewicki commented on the trade-offs faced by the risk review board:
All the risk management we do does not eliminate risk, and often we trade one risk against another to get to an acceptable level. We certainly launch missions that can fail. We balance the hope that the mission just might work out versus the benefits and costs from delaying two or more years. Delay gives us time to mitigate the remaining risks but at much higher project cost.
110-031 Jet Propulsion Laboratory
12
The project team and risk review board then spent several hours assessing the mission consequences from the two critical risks, along with some remaining quirks of the radar and some residual questions about the heat shield and parachute performance. They concluded that the overall probability of success of the EDL stage was currently about 80%. Lee felt that the risk review board was generally comfortable if the first digit in the likelihood of success was a 9. An 8 generated considerable discussion and linked back to the review board’s risk appetite, which varied by type of mission. On flagship missions costing more than $2 billion, the board wanted about a 96% probability of success before it would recommend a launch. For a discovery mission, with a cost of about $0.5 billion, 90% could be acceptable. On a lower-cost mission or where the next launch opportunity would be four or more years out, the limit might drop to 70%. Lee described the consequences of delay:
Launch delay can add between 20%–40% to the cost of the entire project as JPL and its contractors maintain personnel to reduce mission risks, and perform the necessary re-analyses to align the mission design with the later launch opportunity. The extra money comes from other JPL missions in process, reducing the science potentially available from them.
Also, JPL and NASA have heavily promoted the exciting science from the MBE mission. Any delay would severely diminish their reputation as well as that of its contractors. The delay would significantly erode taxpayer and Congressional support for future projects if they concluded that the MBE money had not been well spent. Finally, many of our projects in the pipeline build on each other, so delay creates an adverse effect on the science basis for our downstream projects. But against all these costs from delaying the launch is the catastrophic fallout should the mission fail.
As the risk review board neared the end of the three-day Critical Events Readiness Review meeting, Lee wondered where the board should come out. Should it recommend going forward with the launch? Or should it delay and give the project team two more years to resolve the MBSA instrument package and landing site issues?
110-
031
-1
3-
Exh
ibit
1A
rtis
t’s R
end
itio
n of
the
Ent
ry-D
esce
nt-L
and
ing
(ED
L) S
tage
Sour
ce:
Cou
rtes
y of
NA
SA/
JPL
.
110-031
14
Exhibit 2
Source: Cas
Lik
elih
ood
Heat Map
sewriters, based
at PMSR Rev
on internal comp
view Meeting
pany documentss.
Jet Propu
ulsion Laboratorry
110-
031
-1
5-
Exh
ibit
3H
eat M
ap a
t CD
R M
eeti
ng
Cri
tical
ityTr
end
App
roac
h
Dec
reas
ing
(Impr
ovin
g)M
-M
itiga
te
W -
Wat
ch
A -
Acc
ept
R -R
esea
rch
Incr
easi
ng (W
orse
ning
)
Unc
hang
ed
New
sin
ce la
st M
onth
Hig
h
Med
Low
Ran
k &
Tren
d
Ris
k
IDA
ppro
ach
Ris
k Ti
tle
15
MB
io-M
arke
r Det
ectio
n C
apab
ility
23
MR
AD
AR
Rel
iabi
lity
37
RLa
ndin
g si
te s
urvi
vabi
lity
46
WS
ub-s
urfa
ce S
ampl
e A
cqui
sitio
n
54
WP
ower
sys
tem
per
form
ance
61
AH
eat S
hiel
d pe
rfor
man
ce
72
WP
arac
hute
per
form
ance
5 4 3 2 1
12
34
5
Liklihood
Con
sequ
ence1 2
4
3
6 75
Sour
ce:
Cas
ewri
ters
, bas
ed o
n in
tern
al c
omp
any
doc
um
ents
.
Likelihood
110-
031
-1
6-
Exh
ibit
4H
eat M
ap a
t the
End
of t
he C
ERR
Mee
ting
Cri
tical
ityTr
end
App
roac
h
Dec
reas
ing
(Impr
ovin
g)M
-M
itiga
te
W -
Wat
ch
A -
Acc
ept
R -R
esea
rch
Incr
easi
ng (
Wor
seni
ng)
Unc
hang
ed
New
sin
ce la
st M
onth
Hig
h
Med
Low
Ran
k &
Tren
d
Ris
k
IDA
ppro
ach
Ris
k T
itle
17
RLa
ndin
g si
te s
urvi
vabi
lity
25
WB
io-M
arke
r Det
ectio
n C
apab
ility
36
WS
ub-
surf
ace
Sam
ple
Acq
uisi
tion
43
AR
AD
AR
Rel
iab
ility
54
AP
ow
er s
yste
m p
erfo
rman
ce
61
AH
eat S
hiel
d pe
rfor
man
ce
72
AP
arac
hute
per
form
ance
5 4 3 2 1
12
34
5
Liklihood
Con
sequ
ence
3 6
1 2
4
57
Sour
ce:
Cas
ewri
ters
, bas
ed o
n in
tern
al c
omp
any
doc
um
ents
.
Likelihood