21
How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University of York

How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

Embed Size (px)

Citation preview

Page 1: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

How safe is safe enough?(and how do we demonstrate that?)

Dr David Pumfrey

High Integrity Systems Engineering Group

Department of Computer Science

University of York

Page 2: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

2

Why System Safety ?

Why do we strive to make systems safe?

Self interestwe wouldn’t want to be harmed by systems we develop and use

unsafe systems are bad business

We have to do sorequired by law

required by standards

But what do the law and standards represent?laws try to prevent what society finds morally unacceptable

ultimately assessed by the courts, as representatives of society

standards try to define what is acceptable practiseto discharge legal and moral responsibilities

Page 3: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

3

Perception of Safety

Perception (and hence individual acceptance) of risk affected by many factors

(Apparent) degree of control

Number of deaths in one accident (aircraft versus cars)

Familiarity vs. novelty

“Dreadness” of risk (“falling out of the sky”, nuclear radiation)

Voluntary vs. involuntary risk (hang gliding vs nuclear accident)

Politics and journalismFrequency / profile of reporting of accidents / issues

Experience

Individual factors – age, sex, religion, culture

How do companies (engineers?) make decisions given diversity of views?

Page 4: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

4

Getting it wrong – some examples

Ariane 5 (mis-use of legacy software)

A330 ADIRUs (software could not cope with hardware failure mode)

Boeing 777 (software error allowed switch back to ADIRU that had previously been detected as faulty)

Therac 25 (software errors contributed to radiotherapy overdose accidents)

Page 5: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

5

Boeing 777An incident of massive altitude fluctuations on a flight out of Perth

Problem caused by Air Data Inertial Reference Unit (ADIRU)

Software contained a latent fault which was revealed by a change

Problem was in fault management/dispatch logic

June 2001 accelerometer#5 fails with erroneoushigh output values, ADIRUdiscards output values

Power Cycle on ADIRU occurs each occasionaircraft electrical systemis restarted

Aug 2006 accelerometer #6 fails, latent softwareerror allows use of previously failed accel #5

http://www.atsb.gov.au/publications/investigation_reports/2005/AAIR/aair200503722.aspx

Page 6: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

6

Therac 25

Therac 25 was a development of (safe, successful) earlier medical machines

Intended for operation on tumoursUses linear accelerator to produce electron stream and generate X-rays (both can be used in treatments)

X-ray therapy requires about 100 times more electron energy than electron therapy

this level of electron energy is hazardous if patient exposed directly

Selection of treatment type controlled by a turntable

Page 7: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

7

Therac 25 Schematic

M irror

Counterw eight

Electron modescan target

X-ray modetarget

Position sensemicrosw itchassembly Locking

plunger

Page 8: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

8

Software in Therac-25

On older models, there were mechanical interlocks on turntable position and beam intensity

In Therac-25, mechanical interlocks were removed; turntable position and beam activation were both computer controlled

Older models required operator to enter data twice - at patient’s side, in shielded area – and then cross-checked

In Therac-25, data only entered once (to speed up therapy sessions)

Very poor user interfaceDisplay updated so slowly experienced therapists could “type ahead”

Undocumented error codes which occurred so often the operators ignored them

Six over-dosage accidents (resulting in deaths)May have been many cases where ineffective treatment was given

Page 9: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

Westland Helicopters Merlin (EH101)

Helicopter Electric Actuation Technology (HEAT) project

To replace traditional flight controls...rods and links

power assistance from high pressure hydraulics

...with electrical actuationsmaller, lighter

reduction in fire risk

BUT

totally fly-by-wire – no mechanical reversion

flight control electronics become extremely critical

Page 10: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

10

Eurofighter Typhoon: Display Processor Hardware

Second MMU

Private RAM

Private ROM

Timers

Shared RAM

Shared ROM

Priv

ate

bus

Priv

ate

bus

Loca

l bus

Processor

Private RAM

Private ROM

Timers

Arbitration

Arbitration

Processor

Second MMU

I/O

Arbitration

Specialisthardware

Arbitration

System bus

Page 11: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

11

Timing diagram

ContextSwitch

GenerateBroadcastInterrupt

System Health Monitor

Check_SG Status

MIM InputOutput

ProcessorCBIT

SG BLTProcessor

CBITLocaliseBus Data

ProcessorCBIT

ProcessorCBIT

CPERequests

HUDMonitor

ContextSwitch

CPE_USERHSG_CBIT &CHECKSUM

ContextSwitch

System Health Monitor MIM InputProcessor

CBITSG BLT

RadarTransfer

LocaliseBus Data

ProcessorCBIT

RadarInterfaceManager

ServiceDiscrete I/F

ProcessorCBIT

IC_USERL_MSG_CBIT& CHECKSUM

& MIM_CBIT

Global BusInput Data

ProcessorCBIT

ProcessorCBIT

LocaliseBus Data

SG BLTProcessor

CBITProcessor

CBITProcessor

CBITContextSwitch

PE_USERC_MSG_CBIT

&CHECKSUM

ProcessorCBIT

ProcessorCBIT

LocaliseBus Data

SG BLTProcessor

CBITProcessor

CBIT

Global BusInput Data

ProcessorCBIT

ProcessorCBIT

SG BLTLocaliseBus Data

ProcessorCBIT

ProcessorCBIT

ProcessorCBIT

ProcessorCBIT

SG BLTLocaliseBus Data

ProcessorCBIT

ProcessorCBIT

ProcessorCBIT

ProcessorCBIT

SG BLTLocaliseBus Data

ProcessorCBIT

ProcessorCBIT

ProcessorCBIT

ProcessorCBIT

SG BLTLocaliseBus Data

ProcessorCBIT

ProcessorCBIT

ContextSwitch

System Health Monitor

ContextSwitch

System Health Monitor

ContextSwitch

System Health Monitor

ContextSwitch

System Health Monitor

ContextSwitch

System Health Monitor

ContextSwitch

System Health Monitor

CPE

IC

PE1

PE2

PE3

PE4

PE5

PE6

ContextSwitch

PE_USERR_MSG_CBIT

&CHECKSUM

ContextSwitch

PE_USERCBIT

CHECKSUM

ContextSwitch

PE_USERCBIT

CHECKSUM

ContextSwitch

PE_USERCBIT

CHECKSUM

ContextSwitch

PE_USERCBIT

CHECKSUM

ProcessorCBIT

ProcessorCBIT

ProcessorCBIT

ProcessorCBIT

ProcessorCBIT

ProcessorCBIT

ProcessorCBIT

Global BusInput Data

ProcessorCBIT

TimerInterruptLevel 6

Sync 1CPE(All)

Sync 2CPE

Sync 3PE1

(CPE)

Sync 4CPE

Sync 5IC

Sync 6PE3

Sync 7CPE(All)

SUPERVISOR

USER

SUPERVISOR

USER

SUPERVISOR

USER

SUPERVISOR

USER

SUPERVISOR

USER

SUPERVISOR

USER

SUPERVISOR

USER

SUPERVISOR

USER

VME Bus Block Transfer Period

BroadcastInterruptLevel 5

BroadcastInterruptLevel 5

BroadcastInterruptLevel 5

BroadcastInterruptLevel 5

BroadcastInterruptLevel 5

BroadcastInterruptLevel 5

BroadcastInterruptLevel 5

Non-synchronised supervisoroperations

MCTimer

Update

ContextSwitch

Radar EOF interrputsCPU level 5

No latency

IFF EOF interrputsCPU level 5No latency

Sync withdata wordCPU level 2

Acyclicinterrupts

Warninginterrupt

Multi-mission data / PDSload

Page 12: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

12

Recursive Resource DependencyEVENTS M EM O RY

RAMRO M CPU regs I/O regs

ProgramRO M

StackRAM

Critica lvariab les

Interrupts O utput events

M astercycle clock

M M Uregisters

Bus arb itra tioncontro l reg isters

T imerregisters

Interruptconfiguration

registers

In itia lisationroutines

RAMRO M CPU regs

All resources

Intrinsica llycritica l resources

Primary contro lresources

In itia lisation routines forprimary contro l resources use

system resources, anddependencies become cyclic.

Page 13: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

13

Safety Cases: Who are they for?

Many people and organisations will have an interest in a safety case

supplier / manufactureroperatorregulatory authoritiesbodies that conduct acceptance trialspeople who will work with the system

and their representatives (unions)

“neighbours” (e.g. general public who live round an air base)emergency services

May need more than one “presentation” of safety case to suit different audiencesWho has the greatest interest?

Page 14: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

14

Goal Structuring Notation

Purpose of a Goal Structure

To show how goals are broken down into

sub-goals, and eventually supported by evidence

(solutions) whilst making clear the

strategies adopted, the rationale for the

approach (assumptions, justifications)

and the context in which goals are stated

A/J

Page 15: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

15

Control Systemis Safe

All identified hazards eliminated /

sufficiently mitigated

I.L. Process Guidelines defined

by Ref X.

Hazards Identifiedfrom FHA (Ref Y)

Tolerability targets(Ref Z)

Fault Tree Analysis

FormalVerification

Process Evidenceof I.L. 4

Probability of H2 occurring

< 1 x 10-6 per annum

H1 has been eliminated

Probability of H3 occurring

< 1 x 10-3 per annum

Primary Protection System developed

to I.L. 4

Secondary Protection System developed to I.L. 2

Process Evidence of

I.L. 2

J

1x10-6 p.a.limit for

Catastrophic Hazards

Software developed to I.L.

appropriate to hazards involved

A Simple Goal Structure

Page 16: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

16

HEAT: Developing the ArgumentTop goal

Trials aircraft is acceptably safe to fly with HEAT/ACT fitted

System

HEAT/ACT system is acceptably safe

Clearance

Procedures for flight clearance and certification followed

Integration

Trials a/c remains acceptably safe with HEAT fitted

SMS

SMS implemented to DS00-56

Product

All identified hazards have been suitably addressed

Process

All relevant requirements and standards have been complied with

Page 17: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

17

Progressive Development

Hazard Log Application

G 1.1.4.7Hazard Log

requirement satisfied

G 1.1.4.7Hazard Log

requirement satisfied

G 1.1.4.7.1Hazard Log initiated

Hazard Log Application

Hazard Log Guidance

Notes document

G 1.1.4.7.3Hazard Log used to assess levels of risk throughout project

G 1.1.4.7.2Hazard Log correctly

maintained

Safety Review minutes

G 1.1.4.7Hazard Log

requirement satisfied

G 1.1.4.7.1Hazard Log initiated

Hazard Log Application

Hazard Log Guidance

Notes document

ISAT Hazard Log audit

report

G 1.1.4.7.3Hazard Log used to assess levels of risk throughout project

G 1.1.4.7.2Hazard Log correctly

maintained

Safety Review minutes

G 1.1.4.7.2.1Access rights to

Hazard Log correctly controlled

G 1.1.4.7.2.2Sign-off procedure and

rights to Hazard Log correctly controlled

G 1.1.4.7.2.3 Hazard Log

used consistently

G 1.1.4.7.2.4 Hazard Log update

procedure understood and correctly followed

Page 18: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

18

An analogy

Safety case like a legal case presented in courtLike a legal case, a safety case must:

be clearbe crediblebe compellingmake best use of available evidence

Like a legal case, a safety case will always be subjectiveThere is no such thing as absolute safetySafety can never be provedAlways making an argument of acceptability

Page 19: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

19

What is a convincing argument?Example: The Completeness Problem

G1.1.2.1.1

All relevant airworthiness requirements have been identified completely and correctly

AwComplete

Argument by showing extreme improbability of overlooking relevant requirements

AwCorrect

Argument by showing assumptions used to derive requirements were correct

G1.1.2.1.1.1

Airworthiness requirements specified

G1.1.2.1.1.2

Relevant airworthiness requirements satisfy mandated standards where applicable

G1.1.2.1.1.3

Relevance of airworthiness requirements to HEAT/ACT assessed by competent staff

G1.1.2.1.1.4

Assumptions are proven correct by flight test

BoC

Basis of Certification document

#####

CompAwStaff

Competencies of staff used to filter

airworthiness requirements

AwSigs

Competencies of specialists used to vet and approve

requirements

FltTest

Assumptions proven by flight test

DS970

Def Stan 00-970

JAR 29

GRS

EH101 General Requirement Specification

Page 20: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

20

How is evidence used?

Strong, specific – individually compelling, taken together show system properties

Weak, general – compelling in sum

Think about evidence in used in legal (court) caseDirect - Supports a conclusion with no “intermediate steps”

e.g. a witness testifies that he saw the suspect at point X at time Y.

Circumstantial - Requires an inference to be made to reach a conclusione.g. ballistics test proves the suspect’s gun fired the fatal shot.

Safety case evidence is similare.g. Testing is direct – shows how the system behaves in specific instanceConformance to design rules is indirect – allows inference that system is fit for purpose (if rules have been proven)

Evidence may “stack up” in different ways:

Page 21: How safe is safe enough? (and how do we demonstrate that?) Dr David Pumfrey High Integrity Systems Engineering Group Department of Computer Science University

21

Conclusions

Demonstrating safety is a challenge

We are building ever more complex systems

Much of the “bespoke” complexity is in software

Essential that safety is a design driver...

... and also, design for ability to demonstrate safety