Safety Critical Systems T 79.5303 Design for safety hardware and software Ilkka Herttua

Preview:

Citation preview

Safety Critical Systems

T 79.5303

Design for safety hardware and software

Ilkka Herttua

V - Lifecycle model

SystemAcceptance

System Integration & Test

Module Integration & Test

Requirements Analysis

Requirements Model

Test Scenarios Test Scenarios

SoftwareImplementation

& Unit Test

SoftwareDesign

Requirements Document

Systems Analysis &

Design

Functional / Architechural - Model

Specification Document K

now

led

ge B

ase

** Configuration controlled Knowledge that is increasing in Understanding until Completion of the System:

• Requirements Documentation• Requirements Traceability• Model Data/Parameters• Test Definition/Vectors

Designing for Safety

• Faults groups

- requirement/specification errors

- random component failures

- systematic faults in design (software)• Approaches to tackle problems

- right system architecture (fault-tolerant)

- reliability engineering (component, system)

- quality management (designing and producing processes)

Designing for Safety• Hierarchical design

- simple modules, encapsulated functionality- separated safety kernel – safety critical functions

• Maintainability- preventative versa corrective maintenance- scheduled maintenance routines for whole lifecycle - easy to find faults and repair – short MTTR (mean time to repair)

• Reduce human error- Proper HMI

Hardware Faults

Intermittent faults- Fault occurs and recurs over time (loose connector)Transient faults- Fault occurs and may not recur (lightning)- Electromagnetic interferencePermanent faults- Fault persists / physical processor failure (design fault – over current)

• Fault tolerance hardware- Achieved mainly by redundancy- Adds cost, weight, power consumption, complexityOther means:- Improved maintenance, single system with better materials (higher mean time between failure - MTBF)

Fault Tolerance

Redundancy types

Active Redundancy:- Redundant units are always operating in parallel

Dynamic Redundancy (standby):- Failure has to be detected- Changeover to other module

Hardware redundancy techniques

Active techniques: - Parallel (k of N)- Voting (majority/simple)

Standby techniques :- Operating - hot stand by- Non-operating – cold stand by

Hardware reliability prediction

• Electronic Components- Based on probability and statistical- MIL-Handbook 217 – experimental data on actual device behaviour- Manufacture information and allocated circuit types-Bath tube curve; burn in – useful life – wear out

Safety Critical Hardware

Fault Detection:- Routines to check that hardware works- Signal comparisons - Information redundancy –parity check etc..- Watchdog timers- Bus monitoring – check that processor alive- Power monitoring

Safety Critical Hardware

1. Commercial Microprocessors

- No safety firmware, least assurance- Redundancy makes better, but

common failures possible- Fabrication failures, microcode and

documentation errors- Use components which have history

and statistics.

Safety Critical Hardware

2. Special reliable Microprocessors

- Collins Avionics/Rockwell AAMP2- Used in Boeing 747-400 (30+ pieces)- High cost – bench testing, documentation, formal verification- Other models: SparcV7, TSC695E, ERC32 (ESA radiation-tolerant), 68HC908GP32 (airbag)

Safety Critical Hardware

3. Programmable Logic Controllers (PLC)• Contains power supply, interface and one or more processors.• Designed for high mean time between failure (MTBF)• Solid Firmware • Program stored in EEPROMS• Programmed with ladder or function block diagrams

Safety Critical Software

Software development:- Normally iteration is needed to develop a working solution. (writing code, testing and modification).- In non-critical environment code is accepted, when tests are passed.- Testing is not enough for safety critical application – Software needs an assessment process: dynamic/static testing, simulation, code analysis and formal verification.

Safety Critical Software

Dependable Software :

- Process for development- Work discipline - Well documented- Quality management- Validated/verified

Safety-Critical Software

Software faults:- Requirements defects: failure of software requirements to specify the environment in which the software will be used or unambiguous requirements- Design defects: not satisfying the requirements or documentation defects- Code defects: Failure of code to conform to software designs.

Safety-Critical Software Software faults:- Subprogram effects: Definition of a called variable may be changed. -Definitions aliasing: Names refer to the same storage location.- Initialising failures: Variables are used before assigned values.- Memory management: Buffer, stack and memory overflows- Expression evaluation errors: Divide-by-zero/arithmetic overflow

Safety Critical Software Safety Critical Programming Language:

- Logical soundness: Unambiguous definition of the language- no dialects of C++ - Simple definitions: Complexity can lead to errors in compliers or other support tools- Expressive power: Language shall support to express domain features efficiently and easily- Security of definitions: Violations of the language definition shall be detected- Verification: Language supports verification, proving that the produced code is consistent with the specification. - Memory/time constrains: Stack, register and memory usage are controlled.

Safety Critical Software Language comparison:-Structured assembler (wild jumps, exhaustion of memory, well understood)- Ada (wild jumps, data typing, exception handling, separate compilation)- Subset languages: CORAL, SPADE and Ada (Alsys CSMART Ada kernel)- Validated compilers for Pascal and Ada- Available expertise: with common languages higher productivity and fewer mistakes, but C still not appropriate.

Safety Critical Software

Languages used :- Boeing uses mostly Ada, but still for type 747-400 about 75 languages used.- ESA mandated Ada for mission critical systems.- NASA Space station in Ada, some systems with C and Assembler.- Car ABS systems with Assembler- Train control systems with Ada- Medical systems with Ada and Assembler- Nuclear Reactors core and shut down system with Assembler, migrating to Ada.

Safety Critical Software

Tools- High reliability and validated tools are required: Faults in the tool can result in faults in the safety critical software.- Widespread tools are better tested- Use confirmed process of the usage of the tool- Analyse output of the tool: static analysis of the object code- Use alternative products and compare results- Use different tools (diversity) to reduce the likelihood of wrong test results.

Safety Critical Software

Designing Principles 1- New software features add complexity, try to keep software simple - Plan for avoiding human error – unambiguous human-computer interface- Removal of hazardous module (Ariane 5 unused code)

Safety Critical Software

Designing Principles 2- Add barriers: hard/software locks for critical parts- Minimise single point failures: increase safety margins, exploit redundancy and allow recovery.- Isolate failures: don‘t let things get worse.- Fail-safe: panic shut-downs, watchdog code- Avoid common mode failures: Use diversity – different programmers, n-version programming

Safety Critical Software

Designing Principles 3

- Fault tolerance: Recovery blocks – if one module fails, execute alternative module.

- Don‘t relay on run-time systems

Safety-Critical Software

Techniques/Tools:

-Fault prevention: Preventing the introduction or occurrence of faults by using design supporting tools (UML with CASE tool)-Fault removal: Testing, debugging and code modification

Safety Critical Software Software tool faults:- Faults in software tools (development/modelling) can results in system faults.-Techniques for software development (language/design notation) can have a great impact on the performance or the people involved and also determine the likelihood of faults.- The characteristics of the programming systems and their runtime determine how great the impact of possible faults on the overall software subsystem can be.

Practical Design Process (By I-Logix tool manufacture – Statemate)

Improved Development Process

Intergrated Development Process

Verified software process

Safety Critical Software

Reduction of Hazardous Conditions

- Simplify: Code contains only minimum features and no unnecessary or undocumented features or unused executable code- Diversity: Data and control redundancy - Multi-version programming: shared specification leads to common-mode failures, but synchronisation code increases complexity

Home assignments 2 a• Neil Storey’s book: Safety Critical Computer Systems

- 5.10 Describe a common cause of incompleteness within specifications. How can this situation cause problems?

- 9.17 Describe the advantages and disadvantages of the reuse of software within safety critical projects.

Cont.

Home assignments 2 b- 7.15 A system may be described by the following reliability

model, where the numbers within the boxes represent the module reliability. Calculate the system reliability.

Email by 1. March to herttua@eurolock.org

0,7

0,7

0,70,9

0,98 0,97

0,99

Recommended