27
ORNL is managed by UT-Battelle for the US Department of Energy FITL: Extending LLVM for the Transla8on of Fault- Injec8on Direc8ves November 15, 2015 LLVM HPC @ SC15 http://ft.ornl.gov [email protected] Joel E. Denny Seyong Lee Jeffrey S. Vetter

FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

ORNL is managed by UT-Battelle for the US Department of Energy

FITL: Extending LLVM for the Transla8on of Fault-Injec8on Direc8ves

November 15, 2015

LLVM HPC @ SC15

http://ft.ornl.gov [email protected]

Joel E. Denny Seyong Lee

Jeffrey S. Vetter

Page 2: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

2

Outline • Background&Motivation

• Design&Contributions• Fault-InjectionPragmasforC

• FITLExtensionstoLLVM

• CaseStudies

Page 3: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

Background & Mo8va8on

Page 4: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

4

Background: Fault Injec8on • Emulatehardwarefaultstostudyapplicationresiliency• Examplecausesoffaults:

– Cosmicrays,alphaparticles– Thermaleffects– Noise

• Typesoffaults:– Permanentfaults:wedonotmodelthis– Transientfaults:modeledassinglebitflips

•  Impactoftransientfaults:– Benignfault–noimpactonapplication– Crash–obviousfailureofapplication– Silentdatacorruption(SDC)

Page 5: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

5

Mo8va8on: HPC and Resilience • HPCsystemsevolvingtowardexascale:

– Millionsofcomputationalunits,memoryunits,sockets– Newpowerreductionstrategiesneeded– Frequencyofhardwarefaultswillgrowdramatically

• Prediction:– Meantimetofailurewillbetooshortforexistingresiliencysolutions– Checkpoint,restart,hardware-onlysolutionswillnotbesufficient– Resiliencemustbeaddressedinsoftwarestack– Hardwarefaultsexposedtoapplicationlevel

• Needawaytostudyapplicationresilience

Page 6: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

6

Mo8va8on: Fault Injector Tool Design • Faultinjectortoolusecases:

– Studyresiliencyofapplicationstodevelopnewresiliencemechanisms– Unittestinganddebuggingframeworkforresiliency

• Designqualities:– Flexibility:whatkindsofhardwarefaultscenarioscanweemulate?– Accuracy:arethesimulationsrealistic?– Usability:howeasyisittoconfigurestudiesandunderstandresults?–  Implementability:howeasyisittoimplement?

• Designvariables:– Levelofrepresentationatwhichfaultinjectoroperates– Levelofrepresentationatwhichuserconfiguresfaultinjection

Page 7: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

Design & Contribu8ons

Page 8: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

8

Design Quality vs. Level of Representa8on

Hardware Assembly Compiler IR Source

Des

ign

Qua

lity

Level of Representation

LLVM IR

Usability &

Implementability

Accuracy &

Flexibility

Page 9: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

9

Design Quality vs. Level of Representa8on

Hardware Assembly Compiler IR Source

Des

ign

Qua

lity

Fault Injection

Accuracy &

Flexibility Implementability

Hardware Assembly Compiler IR Source

Des

ign

Qua

lity

User Interface

Usability

LLVM IR Directives

Page 10: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

10

Architecture

...

FITL LLVM IR

LLVM IR

Target Binary

OpenARC Other Compilers

FITL Pass

LLVM

C Source +

Fault-Injection Directives

Other Languages +

Fault-Injection Directives

Page 11: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

11

Contribu8ons • Asetofnovelfault-injectionpragmasforC

• FITL(Fault-InjectionToolkitforLLVM):– FITLLLVMIR:metadataandintrinsicsthatspecifyfaultinjection– FITLpass:LLVMpassthatlowersFITLLLVMIRtostandardLLVMIR

• AbstractionsthatfacilitatethetranslationofpragmastoFITL

• Fault-injectionstudies:–  Includescomparisonwithsource-levelfaultinjector

Page 12: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

Fault-Injec8on Pragmas for C

Page 13: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

13

Fault-Injec8on Sites • FundamentalconceptatsourcelevelandLLVMIRlevel:

– Source:sitesaredefinedbydirectives– LLVMIR:siteisabasicunitofcodeinsertedtoperformfaultinjection

• Eachsiteisatriple:– Executionposition:positionincodewheresiteisinserted– Target:memoryorinstructionintowhichafaultmightbeinjected– Condition:whetherfaultisinjected

• Sitesaredefinedstatically,conditionisevaluateddynamically

Page 14: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

14

Pragma: Pinject + Pdata • Precisespecificationofasinglesitetargetingmemory• Executionposition:exactlywhereftinjectappears:immediatelybeforeprintfcall

• Target:arr[i] • Condition:true

void print_array(double *arr, int n) { for (int i = 0; i < n; ++i) { #pragma openarc resilience { #pragma openarc ftinject ftdata(arr[i:1]) printf(“arr[%d] = %e\n”, i, arr[i]); } } }

Page 15: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

15

Pragma: Pregion + Pdata • Randomselectionfromregionofsitesismorerealistic

• Executionpositions:immediatelybeforeeveryinstruction

• Target:arr[i] • Condition:onesiteisrandomlyselected

void print_array(double *arr, int n) { for (int i = 0; i < n; ++i) { #pragma openarc resilience { #pragma openarc ftregion ftdata(arr[i:1]) printf(“arr[%d] = %e\n”, i, arr[i]); } } }

Page 16: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

16

Pragma: Pregion + Pkind • Targetscomputationalunitsinsteadofmemory

• Executionpositions:immediatelybeforeeveryinstructiontakingafloatingpointargument

• Targets:thefloatingpointerarguments

• Condition:onesiteisrandomlyselected void print_array(double *arr, int n) { for (int i = 0; i < n; ++i) { #pragma openarc resilience { #pragma openarc ftregion ftkind(floating_arg) printf(“arr[%d] = %e\n”, i, arr[i]); } } }

Page 17: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

FITL Extensions to LLVM

Page 18: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

18

Basic Block Set • EachpragmaandanyattachedCblockgeneratesasetofLLVMIRinstructions

• CompilerfrontendmustcommunicatetoFITLpasswhichinstructionsbelongtowhichpragma

• Ourchoiceofgranularityisthebasicblock,soeachpragmahasabasicblockset

• Compilerfrontendmightneedtosplitbasicblocks

Page 19: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

19

Entry Intrinsic • AbasicblocksetScanhaveanentryintrinsic:

– CalledsoonaftercontrolflowentersS– ConfiguresfunctionalitywithinS–  Itsargsencodeclausesofassociatedpragma

Pragma Basic Block Set Kind Entry Intrinsic resilience llvm.fitl.domain llvm.fitl.startDomain

ftregion llvm.fitl.region llvm.fitl.startRegion

ftinject llvm.fitl.desert llvm.fitl.inject

Page 20: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

20

Basic Block Set Example #pragma openarc ftregion ftkind(floating_arg) printf("arr[%d] = %e\n", i, arr[i]);

.fitl.region.entryFirstBB: call void @llvm.fitl.startRegion.i64(metadata !4, metadata !"floating_arg”, i8* null, i64 0, i64 0, i64 0) br label %.fitl.region.bodyFirstBB, !llvm.fitl.regionList !7 .fitl.region.bodyFirstBB: %.load3 = load double** %arr.stackedParam %.load4 = load i32* %i %.add = getelementptr double* %.load3, i32 %.load4 %.load5 = load i32* %i %.load6 = load double* %.add %.ret = call i32 (i8*, ...)* @printf(i8* getelementptr inbounds ([14 x i8]* @.str, i32 0, i32 0), i32 %.load5, double %.load6) br label %.fitl.region.nextBB, !llvm.fitl.regionList !7

!4 = metadata !{metadata !"llvm.fitl.region", metadata !4} !7 = metadata !{metadata !"llvm.fitl.regionList", metadata !4}

basic block set list metadata node

basic block set identifier metadata node

Page 21: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

21

FITL Pass •  Insertscodeforfault-injectionsitesspecifiedbyFITLbasicblocksetsandentryintrinsics

• LowersFITLLLVMIRtostandardLLVMIR

• MustbeperformedbeforeotherLLVMpasses

•  BasicBlockSet:– LLVMlibrarycomponentwedeveloped– Readsbasicblocksetmetadata– Findsentryintrinsiccalls– Generallyusefulfordirective-drivenprogramming

...

FITL LLVM IR

LLVM IR

Target Binary

OpenARC Other Compilers

FITL Pass

LLVM

C Source +

Fault-Injection Directives

Other Languages +

Fault-Injection Directives

Page 22: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

Case Studies

Page 23: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

23

Experimental Setup • Studiedresilienceoffourapplications:

– Twokernelbenchmarks:JACOBIandMATMUL– TwoRodiniabenchmarks:BFSandNW

• Eachtestflipsonerandomlyselectedbit,targetiseither:– Userdata– LLVMIRinstructionargumentofspecifictype– LLVMIRinstructionresultofspecifictype

• Eachtestrepeated100x• SequentialexecutiononsingleCPUonmachinethathas:

– Twoquad-coreIntelXeonE5520–  12GBRAM– ScientificLinuxVersion6.5

Page 24: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

24

Results

JACOBI MATMUL

Rodinia BFS Rodinia NW

Page 25: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

25

JACOBI

fault injection into floating point

Page 26: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

26

JACOBI • Faultsinfloatmemory:

– Nocrashes:•  Datadoesn’taffectcontrolflow•  Datadoesn’tincludepointers

– FewSDCs:•  Dataoverwritteneveryiteration•  Roundofferror

• Faultsinfloatcomputation:– Nocrashes(samereasons)– FewSDCsforprofiled(P)– ManySDCsfornon-profiled(L)

Page 27: FITL: Extending LLVM for the Translaon of Fault- Injec8on … · • Case Studies Background & Mova8on. 4 Background: Fault Injec8on • Emulate hardware faults to study application

27

Acknowledgements •  ContributorsandSponsors

–  FutureTechnologiesGroup:http://ft.ornl.gov

–  USDepartmentofEnergyOfficeofScience

•  DOEARESProject:http://ft.ornl.gov/research/ares