22
Software Fault Tolerance (SWFT) How to Design, Develop and Evaluate Robust SW and OS’s Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de Prof. Neeraj Suri Abdelmajid Khelil (Majid) Constantin Sârbu (Dinu) Brahim Ayari Dept. of Computer Science TU Darmstadt, Germany

Software Fault Tolerance (SWFT) How to Design, Develop and Evaluate Robust SW and OS’s Dependable Embedded Systems & SW Group

Embed Size (px)

Citation preview

Software Fault Tolerance (SWFT)How to Design, Develop and Evaluate

Robust SW and OS’s

Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de

Prof. Neeraj Suri

Abdelmajid Khelil (Majid)Constantin Sârbu (Dinu)

Brahim Ayari

Dept. of Computer ScienceTU Darmstadt, Germany

2

Outline of today’s lecture

Course info Course goals Research related to course

© DEEDS GroupSWFT WS ‘07

3

Related Courses

Lectures SW/OS Fault-Tolerance Kanonik: Introduction to Trusted Systems

Seminars Embedded Mobile Computing Secure and Reliable OS

Labs Selected Topics in Dependable SW & Mobile Computing

4

Course Info

Lecture (in English)Wed. (11:40am - 1:20pm), C120

Exercises (in E & G):Thu., 3. DS (10:45-13:20), C110 Starts 25th 2007

Course webpage:http://www.deeds.informatik.tu-darmstadt.de

© DEEDS GroupSWFT WS ‘07

5

Grading Related

Credit points: 7.5 - SWS: 5 (2+3) Exam

Mid-term exam:  25% (E or G), December 17th, 2007 Presentations:    24% (E or G) Final exam:         51% (E or G)

Exercises: (E or G) Practice + presentations

Lab stuff: Optional Do some live programming Gain some practical experience Please take this opportunity!

• May improve your grade (bonus points)• If you have a suggestion for a lab discuss it with us!

© DEEDS GroupSWFT WS ‘07

6

Learn more...

We have a selection of sub-projects related to this lecture will be targeted to interests of students See example See slides research@DEEDS

We offer Bachelor/masters theses HiWi Fun

© DEEDS GroupSWFT WS ‘07

7

Course Goals

Learn software fault tolerance concepts Learn how to develop robust programs

how to deal with software bugs software fault tolerance: continuation of service in the

face of failures

Learn concepts and mechanisms to build software fault tolerance tools

Learn how to evaluate and test robust SW/OS Learn some SW issues related to (a) mobile SW

and (b) security

© DEEDS GroupSWFT WS ‘07

8

Course Outline

1. Introduction/Concepts of SWFT2. SW-FT Mechanisms: Design Aspects

Process pairs, selective retries, graceful degradation,… Checkpointing, N-copy programming (NCP), N-version

Programming (NVP), micro-reboots,... Robust programming, …

3. Evaluation of fault-tolerant SW & OS’s SW reliability SW/OS stress testing Hardening of OS’s, Patching OS Driver profiling and testing

4. Transactional/Mobile SW Mobile transcation (FT, recovery ..), Wireless sensor

networks (Energy-efficient FT, spatial/temporal redundancy ..)

5. SW and Security: Buffer overflows etc

© DEEDS GroupSWFT WS ‘07

9

Literature

Most lectures will be based on research papers: URLs of papers available via class page References on slides (available on web)

Coverage for exams is primarily (a) the lecture content and (b) issues covered over the Exercises....so attending is important

© DEEDS GroupSWFT WS ‘07

10

Research@DEEDS Related to Course

DEEDS: Dependable Embedded Systems & SW Group www.deeds.informatik.tu-darmstadt.de

Dependable Embedded Systems & Software (DEEDS)

11

What isn’t an Embedded System?

response

motessensors

UAV’s

XBWFBW

12

The Spread: Dependability, Safety, Security X-by-Wire: Safety-Service Critical Systems

(Aerospace/Automotive) System Architecture Design Protocols (Synchronous, Membership, Diagnosis, Recovery,

Scheduling) Dependability Evaluation Verification & Validation (V&V)

• Experimental fault injection (PROPANE) • Formal Methods

Distributed Systems: Byzantine Consensus, Failure Detectors, Verification

Mobile/WSN Networks: Fault-tolerant protocols, routing, reliability analysis

CPU Architectures: Energy-efficient FT, Transient Resilience, …

Operating Systems: OS Robustness Evaluation, Driver Testing & Evaluation, Vulnerability Profiling, Embedded/Desktop OS, …

13

Complexity (Devices, Systems) Within a Domain

in-vehicle networks

aircraft flight control

14

Automotive/Aerospace (Federated Systems)

Dist. ResourcesNodes + Comm.

Comm.

Diagnostics

Steering

Env. Ctrl.

BrakingEngine/FlightControl

Navigation

User I/O

Multimedia

Body Elec.

Applications

Middleware

Resources

multiple nodes, varied criticality buses, clusters, bridges (HW, SW), …

15

Re-usable Core (technology + domain

invariant)

Core Services

MW + Arch

PlatformsShared, Distributed/Networked

High-level Services app1 app2 appn

Managing Complexity: Federated to Integrated

Applications will change Multi-Domain Solutions: Automotive,

Aerospace, Control Compositional Framework (+Tools)

Integrates diverse criticality apps Delineation over integration for

functionality and safety Flexible building blocks & interfaces

Technologies will change Benefits

Design flexibility, short time-to-market

Reduced number of nodes Reduced complexity and cost

16

P1: X-by-Wire Protocols

On-Line Diagnosis Enhance sustained autonomic system operations Self-healing

• On-line recovery (transient faults) Self-diagnosing

• Maintenance actions (permanent faults)

Challenges Avoid overreaction to transient faults

• The cure can be worse than the disease! Support mixed-criticalities applications

• From X-by-Wire to Comfort applications Portability for time-triggered (TT) platforms

• Add-on, middleware approach

Contact: Marco Serafini ([email protected])

17

P2: Mobile Database Systems

Mobile transactions Commit protocols

Challenges: Frequent perturbations Heterogeneity

• Wireless links (WLAN, UMTS, …)• Mobile nodes (laptops, PDAs, …)

Failures• Unpredictable disconnections• Node/Communication failures

Infrastructure-based vs. ad-hoc Mobile Ad-hoc NETworks (MANETs) Wireless Sensor Networks (WSNs)

Wired NetworkWired

Network

WLAN UMTSGPRS

Contacts: Brahim Ayari ([email protected])Abdelmajid Khelil ([email protected])

WAVE

18

P3: Dependable Ad-hoc Sensor Networks Applications

Car2Car communication• Cooperative driving• Announcements

Tracking & monitoring Measurement Disaster rescue

Research challenges Energy (efficiency,

maintenance..) Frequent failures (detection,

diagnosis..) Safety-critical applications Reliable communication

Contacts: Faisal Karim ([email protected])Abdelmajid Khelil ([email protected])

WAVE

WLAN

ZigBee

19

P4: Energy Efficient Dependable Systems Trends

Heterogeneous systems Increased dependence upon technology Mobility low voltage smaller noise margins more

transient errors Increased complexity Integration/communication between systems

Energy efficient fault tolerance Evaluate Characterize Optimize/trade-off

Dimensions Design-time vs. run-time Time vs. space System level vs. components level Service degradations and reconfiguration

Contact: Neeraj Suri ([email protected])

20

P5: Robustness Evaluation of Embedded OS/SWProblem:SW systems are vulnerable to errors in Commercial-Off-The-Shelf (COTS) components.Characterization of impact of 3rd party SW is hard.

Approach: Focus on device drivers Error propagation analysis using fault injection Robustness enhancing wrappers

Applications: Verification COTS integration (acceptance) Robustness enhancement

BFDT FZ CM

atadisk

BFDT FZ CM

91C111

BFDT FZ CM

cerfio_serial

Class 3

Class 2

Class 1

No failure

0%

20%

40%

60%

80%

100%

Contact: Andréas Johansson ([email protected])

21

P6: Improved Testing of Device DriversProblemFaulty COTS drivers used in modern OSs have a significant impact on system reliability.They are hard to test as execute in kernel space and are delivered sans source code.

Applications Black-box testing for COTS drivers Profiling, debugging System activity monitoring

Research aims Profile driver behavior at runtime Expedite driver testing by focusing on

runtime activity Test methods tuned to OS/driver

operational profiles

Hardware Layer

System Services

OS kernel

Application 1 Application p

Contact: Constantin Sarbu ([email protected])

Driver

Monitor

22

Questions?

???? ?

?

?

? ? ??

?

?

?? ?

?© DEEDS GroupSWFT WS ‘07