33
Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Embed Size (px)

Citation preview

Page 1: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Analysis and design of Fault Tolerant Real-time systems

by

Roozbeh Izadi-ZamanabadiDepartment of Control Engineering

Aalborg University

Page 2: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Overview

Introduction

Page 3: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Definitions - IDependbility:The thruthworthiness of a computer system such

that reliance can justifiably be placed on the service it delivers.The service delivered by a system is its behaviour as it is prerceived by its users (human or physical which interact with the computer system).

Dependability is a general concept and different attributes are related to it. The most significant attributues are:

Reliability, Availability, safety, and security.

Reliability deals with continuity of service.

Availability deals with rediness for usage.

Safety deals with avaoidance of catastrophic consequences on the environment.

Page 4: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Definitions - IISecurity delas with prevention of unauthorized access and/or

handling of information.

Fault prevention <goal> to prevent faults from occurring or getting introduced into the system.

Fault Tolerance <goal> to provide service despite the presence of faults in the system.

Fault tolerance uses protective redundancy to mask failures, i.e. the system contains components that are not needed if no fault tolerance is to be supported.

Fault prevention methods focus on methodologies for design, testing, and validation.

Fault tolerant methods focus on how to use components in a mannersuch that failures can be masked.

Page 5: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Software Architecture for Real-time Systems

Architecture description Languages (ADL) are used to: Communicate (the design solutions)

among software Engineers Support analysis of the architecture

(verify the quality requirements are met)

Make maintenance easier.

Page 6: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

ADL – Desired properties

An ADL should provide six classes of propertiesComposition: described by components and connections.

Abstraction: used to describe exact role of elements clearly and exactly.

Reusability: It should be possible to reuse components, connectors and achitectural pattern.

Configuration: the architectural structure among components should be separated from the sturcture in the compoenents.

Heterogeneity: possibility of combining different heterogeneous descriptions.

Analysis: support the possibility of different kind of analysis.

Page 7: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Architectural views

Structural viewModule viewLogical viewHardware viewTemporal viewCommunication viewSynchronization view

Page 8: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Structural view

Describes the overall architectural design and style. It consists of software modules and their interconnection

Module A Module B

Module C Module D

MASCOT design methodology:decomposed component level view

HRT-HOOD (OO methodology):Parent-objects

Page 9: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Module View

It exposes all the functions, methods or submodules in allthe components modelled in the structural view.

Module A Module B

Module C Module D

It is desireable to hold the interactionBetween functions in different componentsTo a minimum.

Page 10: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Logical viewFunctions from moddule view are described in more logical details.

a!

a?

Different types of state machines and process algebra can be used.

Timed automata for real-timed systems (representing time as well as concurrency)

Page 11: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Hardware view

Module A Module B

Module C Module D

Distributed systems with separated CPUs, Or requirements of pre-allocated functionality among different nodes in the systesm

Processor 1

Processor 2

Page 12: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Temporal view

Correctness of the real-time system depends on correct functions as well as correct timing (i.e. Not too early andnot too late).

Temporal view contains data such as:release time (the eaeliest start time of the task)deadline ( The latest completion time of a task)periode time (frequency of the task)...

HRT-HOOD has a temporal view that is divided in two parts:1 – describes the execution strategies (either Cyclic or sporadic)2 – provides temporal attributes, e.g. Period times, deadlines, ...

Page 13: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Coomunication view

- Model of communication among tasks and processes.- Is performed using messages and signals

P 1 P 3P 2

msg 1

msg 2

msg 3

msg 4

Message Sequence Chart (MSC)

MSC can be translated into ordinary finite state automata, hence easy to verify formally, for instance, using temporal logic

Page 14: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Synchronization view

Multi-tasking system having several tasks running concurrently, it is necessary to syncronize access to shared resources in order to avoid inconsistency.

Different sync. Techniques are avialable depending on the real-time operating syste. F.ex. Pre-run-time scheduelling (pre runtime generated table is used)Event trigered (semaphors are used)

Synchronization

Temporal view

Communication view

Separationin time

Signals used

Page 15: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Architecture analysisThe main goal for using software architecture notation (when designing) is the ability to analyse and verify the design in an early stage of the development process.

The software system quality properties are generally divided into two different classes:• Functional: thoes concerned with the runtime behaviour of the software, e.g. performance or reliability• Nonfunctional: thoes concerned with the quality of the software itself, e.g. maintainability and reusability.

Page 16: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Architecture analysis methods

Systems requirements System domain

Nonfuunctional properties

Functional properties

Scenario based

Checklist based

Scenario executionSimulation/prototyping

Mathematical methods

Questioning

Measuring

Property Class

Page 17: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Architecture analysis methods - 1

• Scenario is always system specific, i.e. Tailor-made for a particular application in a domain.• Checklists contain questions that are valid for all architectures in a particular domian.

Example for safety-critical real-time systems (Checklist):

1. Is the system schedulable?2. Is there error recovery code in the system to clean up

after error detection?

Example for scenario:1. What happes when division by zero occurs in the control

task?

Page 18: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Architecture analysis methods - 2

Measuring techniques:

• Scenario execution: to ”execute” the questions stated by a scenario on the architecture and investigate its effects. (is suited for analysis of non-functional quality properties).

• Simulation/prototyping: the used prototype should be as small as possible. (is used to analyse the functional qualityproperties)

• Mathematical methods: used when mathematical models do exist (such as Timed automata). Examples are: Schedulability test for real-time systems and statistical reliability modelling.

Page 19: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Functional analysisPerformance

The ability to produce results

Reliability deals with continuity of service.

Safety deals with avaoidance of catastrophic consequences on the environment. The property of system to avoid endangering human life or the environment.

Security The ability od a software system to resist malicious intended actions.

Availability The probability of a system functioning correctly at any given time

Temporal constraints

Real-time attributes such as deadlines, jitter, response time, worst case execution time, etc.

Functional quality properties

Page 20: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Functional quality properties

Performance: must have algorithmic solutions as inputs prototyping/simulation teqniques are used Ex.: event throughput or queuing length for events in a

system Performance measure is not absolute (used to compare

different architectures)

Reliability: Attempts have been made to borrow theories used for

hardware systems and adapt them to software. ! Software can never be worn out Alternative method is to measure the testability.

Testability is a function of the effort required in order to assure the required level of reliability or availability

Page 21: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Reliability

Is achieved by using following approaches to handle faults: Fault avidance:

is about designing error free systems. Formal or semi formal metods are used. Semi-formal methods offer a structured way of

reasoning (both at design and analysis level). They are based on some ”formal” notations, e.g.

Unified modelling Language (UML), ADLs, etc. Representing the system model. Example of such methods: Object-oriented analysis and Design (OOA/OOD).

Page 22: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Reliability - 1

Fault removal: is basically the task of finding the errors by testing and removing them by errorr correction

Fault tolerance: Two approches are used:1. Tolerate faults from its environment, e.g. Operator, hardware

errors, etc.2. Tolerant against design faults within software itself.

Ad. 1. Redunant hardware (with their own software blocks) are used.

Ad. 2. Solution approaches include Recovery blocksN version programming

Page 23: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

SafetyConcerned failures that endanger human life and the environment, i.e. Hazards.

Hazard analysis is performed in order to identify hazards.

Techniques for assessing safety properties are mostly scenario based and work either forward or backward.

Backward methods: analysis starts with the hazard as a scenario and try to trace down the responsible component. EX.: FTA (fault tree analysis)

Forward methods: effects of an error in a component is investigated. EX.: FMEA (Failure Mode and Effect Analysis), HAZOP (Hazard and Operability Studies).

Page 24: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Safety - 1

Depending on the result of safety analysis, changes in the design may have to be performed.

Different design approaches to avoid catastrophic failures can be applied based on the severity of an accident caused by the hazard:

Hazard elimination: achieved by Substitution (a dangerous design possibility by a functionally

equivalent, but not dangerous solution). Decoupling (safety-critical parts from non-critical software ex.

Safety kernels, firewalls, ..) Simplifications (KISS rule should be kept in mind)

Hazard reduction: reduces but not eliminates the hazards. Ex.: Erect a fence around an industrial robot.

Page 25: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Safety - 2

Hazard control: Use fail-safe design, i.e. System is designed to detect the hazard and then transfer it into a safe state if such exists.If no safe states exists (such as in airplanes), use fault-tolerance methods, such as redundancy to keep the primary functions alive.

Damage minimization: If accidents occur, the consequences and losses must be reduced (minimize the exposure of the accident to the environment or human beings)

Page 26: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Availability and security

Availability = 1 – (MTTR/MTBF)MTTR = Mean time to repairMTBF = Mean time between failure

Security: <goal> protecting the software against malicious intended actions.Achieved through:safety/security kernels, firewalls, etc.Scenario based methods can be used to assess this property.

Page 27: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Real-time requirements

Temporal correctness of tasks is importantAnalysis: schedulability test, i.e. Whether the task set is schedulable given resources and temporal constraints.Resources: CPUs, communication buses, actuators, etc.Temporal constraints include release times, deadlines, worst case exec. Time, jitter, etc.

Page 28: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Scheduling strategiesScheduling

Preemptive/non-preemptive

Run-time scheduling Pre-run-time scheduling

Priority based

Static priorities Dynamic priorities

FPS+RM User defined PCP ED RM Rate monotonicFPS Fixed priority scheduelingED Earliest deadlinePCP Priority cieling protocol

Page 29: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Non-functional quality properties

Cost The cost for performing any action such as, development, evolution and verification

Testability How easy it is to prove correctness of the system by testing

Reusability The extend to which the architecture can be reused

Portability How easy it is to move the software system to a different hardware and/or software platform

Maintainability

The aptitude of a system to undergo repair and evolution

Modifiability How sensitive the rchitecture is to changes in one or several components.

Page 30: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Non-functional quality properties - 1

Cost: dependes on other properties such as

maintainability, testability and reusability.

Cost estimation is based upon historical experiences with similar systems.

Testability: Proves functional correctness of the

software, hence is essential. Depends on three individual properties:

Page 31: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Non-functional quality properties - 2

Observability: the result must be observed In structural view, components are black boxes, only the

interfaces are observable. The bigger the interface, the more visibility higher testability.

Controllability: Given an input (to the task or a sub-system) one may

control the path taken in the program. If the path only depends on the input itself, maximum controllability is achieved

If there are data dependencies between different modules, the controllability is decreased lower testability.

Reproducability To get high testability, the order in which processes

execute must be controllable or deterministic, i.e. High reproducibility.

Page 32: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Non-functional quality properties - 3

Reusability: example: Standard Template Library (STL) for the

object oriented language C++.

Portability: Dependencies between the software components

in the ystem and the platform are in focus. Platform: hardware, e.g. Processors, A/D

converters, and the operating systems. The less direct dependency between the

component and the plaform, the highest degree of portability.

Page 33: Analysis and design of Fault Tolerant Real-time systems by Roozbeh Izadi-Zamanabadi Department of Control Engineering Aalborg University

Non-functional quality properties - 4

Maintainability: Def.: The amount of changes in the software

architecture enforced by adding new functionality or error corrections.

Scenarios used from the requirements of the new function are used to analyse the existing architecture.

A reference list will be provided on the net.