14
Fault Tolerance: Basic Mechanisms mMIC-SFT September 2003 Anders P. Ravn Aalborg University

Fault Tolerance: Basic Mechanisms

  • Upload
    rozene

  • View
    36

  • Download
    1

Embed Size (px)

DESCRIPTION

Fault Tolerance: Basic Mechanisms. mMIC-SFT September 2003 Anders P. Ravn Aalborg University. Fault Tolerance. Means to isolate component faults. ... And mask them. Prevents system failures. May increase system dependability. Dependability - means. Fault prevention Fault tolerance - PowerPoint PPT Presentation

Citation preview

Page 1: Fault Tolerance:  Basic Mechanisms

Fault Tolerance: Basic Mechanisms

mMIC-SFT September 2003

Anders P. Ravn

Aalborg University

Page 2: Fault Tolerance:  Basic Mechanisms

Fault Tolerance

Means to isolate component faults

Prevents system failures

May increase system dependability

... And mask them

Page 3: Fault Tolerance:  Basic Mechanisms

Dependability - means

• Fault prevention • Fault tolerance• Error Removal• Failure Forecasting

BW p. 106, ...

Page 4: Fault Tolerance:  Basic Mechanisms

Fault Tolerance

Page 5: Fault Tolerance:  Basic Mechanisms

FT - levels

• Full tolerance

• Graceful Degradation

• Fail safeBW p. 107

Page 6: Fault Tolerance:  Basic Mechanisms

FT basis: Redundancy

• Time

• Space

Try Retry Retry ...

TryTry

Try

...

BW p. 109

Page 7: Fault Tolerance:  Basic Mechanisms

N-version programming

V1 V2 V3

Driver (comporator)

Comparison vectors (votes)

Comparison status indicators

BW p. 109Comparison points

Page 8: Fault Tolerance:  Basic Mechanisms

Fault classification (scope of N-VP)

• Origin

• Kind

• Property

• physical (internal/external)

• logical (design/interaction)

• omission

• value

• timing

byzantine

• duration (permanent, transient)

• consistency (determinate, nondeterminate)

• autonomy (spontaneous, event-dependent)

++

(+)++(+)

+ / (+)

+ / ++ / +

Page 9: Fault Tolerance:  Basic Mechanisms

Dynamic Redundancy

1. Error detection

2. Damage confinement and assessment

3. Error recovery

4. Fault treatment and continued service

BW p. 114

Page 10: Fault Tolerance:  Basic Mechanisms

Error Detection

f: State x Input State x Output

• Environment (exception)

• Application

BW p. 115

Assertion:• precondition (input)• postcondition (input, output)• invariant(state, state’)

Timing:• WCET(f, input) • Deadline (f,input)

D

Page 11: Fault Tolerance:  Basic Mechanisms

Damage Confinement

• Static structure

• Dynamic structure

BW p. 117

object

object

II

Page 12: Fault Tolerance:  Basic Mechanisms

Error Recovery

• Forward

• Backward

BW p. 118

Repair the state – if you can !

• define recovery points• checkpoint state at r. p.• roll back• retry

Domino effect

Page 13: Fault Tolerance:  Basic Mechanisms

Recovery blocks

ENSURE acceptance_testBY { module_1 }ELSE BY { module_2 } ...ELSE BY { module_m }ELSE ERROR

BW p. 120

Page 14: Fault Tolerance:  Basic Mechanisms

The ideal FT-component

Exception HandlerNormal mode

Request/response

Request/response

Interfaceexception

Interfaceexception

Failureexception

Failureexception

BW p. 126