Fault Tolerant

  • View
    153

  • Download
    4

Embed Size (px)

Text of Fault Tolerant

FaultTolerantSystemDesign

LecturerElenaDubrova ElectronicSystems(ES) ICT/KTH dubrova@kth.se http://www.ict.kth.se/~dubrova

p.2DesignofFaultTolerantSystemsElenaDubrova,ESDlab

OfficehoursNofixedtime Sendmeanemailwithyourquestionsoraskfor ameeting

p.3DesignofFaultTolerantSystemsElenaDubrova,ESDlab

TeachingAssistantShohrehSarifMansouri PhDStudent,ElectronicSystems ICT/KTH shsm@kth.se

p.4DesignofFaultTolerantSystemsElenaDubrova,ESDlab

Textbook E.Dubrova,FaultTolerantDesign:An Introduction,draft Availablefrommyhomepage

p.5DesignofFaultTolerantSystemsElenaDubrova,ESDlab

CourseevaluationM Midtermexam(20%) F Finalexam(60%) 5 5assignments(20%)

p.6DesignofFaultTolerantSystemsElenaDubrova,ESDlab

Assignments 5assignments,worth20%ofthefinal grade

eachconsistsof56tasks,worth13points shouldbehandledtomeontheduedate lateassignmentswillgetreducedpoints(25% perday)

p.7DesignofFaultTolerantSystemsElenaDubrova,ESDlab

Examinations Midtermexam,45min,worth20%ofthe finalgrade

Finalexam,4hours,worth60%ofthefinal grade 4hours,1012tasksp.8DesignofFaultTolerantSystemsElenaDubrova,ESDlab

willbedoneduring45minonalectureinthe middleofthecourse,45tasks cannotberedone

PhDstudents AdditionalcomponentforPhDstudents: select2interestingpapers/problems,relatedto thecoursematerial bringthemtomefordiscussion Iwillselectoneofthem youwillreadthispaper/solvetheproblem, writea2pagereportandgivea20mintalkat thelastlecturep.9DesignofFaultTolerantSystemsElenaDubrova,ESDlab

Objectives understandingfaulttolerancef faultsandtheireffects(errors,failures) redundancytechniques evaluationoffaulttolerantsystems concepts,underlyingprinciples applicationsp.10DesignofFaultTolerantSystemsElenaDubrova,ESDlab

balance

Overview Introduction Fundamentalsofdependability definitionoffaulttolerance,applications dependabilityattributes:reliability,availability,safety dependabilityimpairments:faults,errors,failures dependabilitymeans commonmeasures:failurerate,MTTF,MTTR reliabilityblockdiagrams Markovprocessesp.11DesignofFaultTolerantSystemsElenaDubrova,ESDlab

Dependabilityevaluationtechniques

Overview Redundancytechniques spaceredundancy hardwareredundancy informationredundancy softwareredundancy

timeredundancy

p.12DesignofFaultTolerantSystemsElenaDubrova,ESDlab

IntroductiontoFaultTolerance

Faulttolerancefault-tolerance is the ability of a system to continue performing its function in spite of faults broken connection bug in program hardware software

p.14DesignofFaultTolerantSystemsElenaDubrova,ESDlab

Easilytestablesystem Easilytestablesystemisonewhoseability toworkcorrectlycanbeverifiedinasimple manner

p.15DesignofFaultTolerantSystemsElenaDubrova,ESDlab

Whydoweneedfaulttolerance? Itispracticallyimpossibletobuildaperfect system

Itishardtoforseeallthefactors

supposeacomponenthasthereliability 99.99% asystemconsistingof100nonredundant componentswillhavethereliability99.01% asystemconsistingof10.000componentswill havethereliability36.79%p.16DesignofFaultTolerantSystemsElenaDubrova,ESDlab

Redundancy Redundancyistheprovisionoffunctional capabilitiesthatwouldbeunnecessaryina faultfreeenvironment replicatedhardwarecomponent paritycheckbitattachedtodigitaldata alineofprogramverfiyingthecorrecntessof theresut

p.17DesignofFaultTolerantSystemsElenaDubrova,ESDlab

History earlycomputersystems basiccomponentshadverylowreliability faulttoleranttechniqueswereneedto overcomeit redundantstructureswithvoting errordetectionanderrorcorrectioncodes

p.18DesignofFaultTolerantSystemsElenaDubrova,ESDlab

History earlycomputersystemsE EDVAC(1949) duplicateALUandcompareresultsofboth continueprocessingifagreed,elsereporterror 2CPUs oneunitbeginexecutingthenextinstructionifthe otherencountsanerror paritycheckondatatransfers

B BellRelayComputer(1950)

I IBM650,UNIVAC(1955)

p.19DesignofFaultTolerantSystemsElenaDubrova,ESDlab

History Adventoftransistors morereliablecomponents ledtotemporarydecreaseintheemphasison faulttolerantcomputing designersthoughtitisenoughtodependon theimprovedreliabilityofthetransistorto guaranteecorrectcomputations

p.20DesignofFaultTolerantSystemsElenaDubrova,ESDlab

History lastdecades morecriticalapplications

VLSImadetheimplementationofmanyredundancy techniquespracticalandcosteffective Otherthanhardwarecomponentfaultsneedtobe tolerated:

spaceprograms,militaryapplications controlofnuclearpowerstations bankingtransactions

transientfaults(softerrors)causedbyenvironmentalfactors softwarefaults

p.21DesignofFaultTolerantSystemsElenaDubrova,ESDlab

Applications safetycriticalapplications criticaltohumansafety aircraftflightcontrol

environmentaldisastermustbeavoided requirements chemicalplants,nuclearplants

99.99999%probabilitytobeoperationalattheend ofa3hourperiod

p.22DesignofFaultTolerantSystemsElenaDubrova,ESDlab

Applications missioncriticalapplications itisimportanttocompletethemission repairisimpossibleorprohibitivelyexpensive Pioneer10waslaunched2March1970, passedPluto13June1983

requirements

95%probabilitytobeoperationalattheendof mission(e.g.10years) maybedegradedorreconfiguredbefore(operator i interactionpossible)

p.23DesignofFaultTolerantSystemsElenaDubrova,ESDlab

Applications bisnesscriticalapplications userswanttohaveahighprobabilityof receivingservicewhenitisrequested transactionprocessing(banking,stock e exchangeorothertimesharedsystems) ATM: