13
CSCI 5230: Project Management Software Reuse Disasters: Therac-25 and Ariane 5 Flight 501 David Sumpter 12/4/2001

Contents

  • Upload
    derry

  • View
    51

  • Download
    2

Embed Size (px)

DESCRIPTION

CSCI 5230: Project Management Software Reuse Disasters: Therac-25 and Ariane 5 Flight 501 David Sumpter 12/4/2001. Contents. Introduction 3 Therac-25 – Background 4 Therac-25 – Software Process 5 Therac-25 – Causes 6 - PowerPoint PPT Presentation

Citation preview

Page 1: Contents

CSCI 5230: Project Management

Software Reuse Disasters: Therac-25 and Ariane 5 Flight 501

David Sumpter

12/4/2001

Page 2: Contents

12/04/01 2

Contents

Introduction 3

Therac-25 – Background 4

Therac-25 – Software Process 5

Therac-25 – Causes 6

Flight 501 – Sequence of Events 7

Ariane 5 Software Process 8

Flight 501 – Cause of Failure 9

Conclusion 10

References 11

Page 3: Contents

12/04/01 3

Introduction

Two famous software engineering disasters Therac-25

Medical accelerator to treat tumors 6 known accidents resulting in death or serious injury

June 1985 – January 1987 Software was adapted from earlier models

Ariane 5 Flight 501 Maiden flight of Ariane 5 launch vehicle

Larger, more powerful successor to the Ariane 4 Exploded approximately 40 seconds after launch

June 1996 Loss traced to software carried over virtually unchanged

from Ariane 4

Page 4: Contents

12/04/01 4

Therac-25 – Background

25 MeV medical accelerator Designed to destroy tumors Dual Mode

Electron beam or X-rays Successor to Therac-6, Therac-20

Therac-6, Therac-20 Computer control added to earlier machines Still capable of stand-alone (no computer) operation

All standard hardware safety mechanisms

Therac-25 more dependent on software Lacked many hardware safety mechanisms of earlier

accelerators

Therac-25 software “evolved from” Therac-6 code PDP-11 assembly, no standard OS Also contained Therac-20 code

Page 5: Contents

12/04/01 5

Therac-25 – Software Process

Little, if any, process

Single programmer

Minimal unit and software testing Emphasis on integrated system testing

1983 safety analysis, in effect, assumed that software had no errors! “Programming errors have been reduced by extensive

testing ... Any residual software errors are not included in the analysis.”

“Computer execution errors are caused by faulty hardware components and by ‘soft’ (random) errors induced by alpha particles and electromagnetic noise.”

Page 6: Contents

12/04/01 6

Therac-25 – Causes

For three of the six known incidents, cause is unknown “there is no way to determine what particular design errors

were related to the... accidents. Given the unsafe programming practices in the code, it is possible that unknown race conditions or errors could have been responsible”

For two fatal accidents in Tyler, Texas Race condition led to inconsistent machine settings, leading to

massive radiation overdoses Same bug found in Therac-20

Hardware safeguards prevented it from causing injuries, from even being discovered until after the Tyler accidents

Fatal accident at Yakima, Washington Overflow of 1-byte variable which led, under rare conditions,

to improper machine settings, leading to massive radiation overdose.

Page 7: Contents

12/04/01 7

Flight 501 – Sequence of Events

Near-simultaneous failure of the primary and back-up Inertial Reference Systems (SRIs) 36 seconds after main engine ignition

Nozzles of the two solid boosters and main engine swivel to extreme positions Nozzles direct rocket thrust, steer launcher Caused launcher to veer abruptly

Links between the solid boosters and the core stage rupture triggered self-destruct

Page 8: Contents

12/04/01 8

Ariane 5 Software ProcessStringent processes in place, but…“the culture within the Ariane program…” only addressed “random hardware failures… which can quite rationally be handled by a backup system”“the view had been taken that software should be considered correct until it is shown to be at fault”! (emphasis added)SRIs were not included, but simulated by special software, in integrated tests Technically difficult and expensive SRIs considered “fully qualified at equipment level”

“The design of the Ariane 5 SRI is practically the same as that of an SRI which is presently used on Ariane 4, particularly as regards the software”

Page 9: Contents

12/04/01 9

Flight 501 – Cause of Failure

Software exception “during… data conversion from 64-bit floating point to 16-bit signed integer” Occurred in SRI software Overflow caused by unexpectedly high value for Horizontal

Bias (BH) variable BH related to horizontal velocity Not protected to save computer processing power

Analysis had determined that overflow could not occur Reasoning not documented in code Ariane 5 has higher horizontal velocity, early in trajectory,

than Ariane 4! That part of software where error occurred was not needed

after launch Requirement to continue operating after launch traces to earlier

versions of Arian Enabled prompt re-start of count-down in event of a hold Did not apply to Ariane 5, but maintained for commonality

Page 10: Contents

12/04/01 10

Conclusion

In both cases, software was carried over from earlier projects where it had seemingly worked well Therac-25

Software defects in earlier machines were hidden by hardware safeguards

No real software development process Apparently no serious evaluation of risks involved in

using software in lieu of hardware safeguards

Ariane 5 Known “defect” was non-issue on Ariane 4 Established software development process in place Issues were considered, but key factor was missed

Page 11: Contents

12/04/01 11

Conclusion, cont.

Misunderstanding of software? Both were primarily hardware projects

Reuse of existing software in the development of new hardware

Not only underestimated complexity of software, but failed to recognize that it was even an issue

Both projects made the absolutely astounding assumption that the software didn’t have errors!

Assumed “black box” that could be swapped in and out of different applications

No evidence that reuse was considered in design of software

Page 12: Contents

12/04/01 12

References

Inquiry Board. Ariane 5 Flight 501 Failure. Inquiry Board report (July 1996). Available online at

http://www.mssl.ucl.ac.uk/www_plasma/missions/cluster/ariane5rep.html

Leveson, N., Turner, C.S. An Investigation of the Therac-25 Accidents. IEEE Computer, vol. 26, no. 7 (July 1993), 18-41. Available online at

http://courses.cs.vt.edu/~cs3604/lib/Therac_25/Therac_1.html

Page 13: Contents

12/04/01 13

Thank YouThank You