Taint Analysis - Nanjing Universityseclab.nju.edu.cn/lecture/TaintAnalysis.pdfDynamic Taint Analysis...

Preview:

Citation preview

Taint Analysis

Contents

2

• Pin ToolØ IntroductionØ Intel PIN CapabilityØ How to instrumentationØ How to Pass ParametersØ Instrumentation granularity

• Dynamic Taint AnalysisØ Classify of taint analysisØ Basic ConceptØ IntroductionØ Byte or bit Ø Shadow MemoryØ Dynamic Taint Analysis

Pin tools

3

Instrumentation

• Atechniquethatinsertscodeintoaprogramtocollectrun-timeinformationorchangeitsbehavior

4

Different Instrumentations• Source-CodeInstrumentation

Ø CompilerPluginü Insertcodewherecompilethesourcetobinaryü Highefficient

• StaticBinaryInstrumentationØ Binaryrewriter

ü Disassemblingandrecompileü Difficulttoensurecorrectness

• DynamicBinaryInstrumentationØ DynamicBinaryInstrumentationTool

ü Instrumentcodejustbeforeitrunsü Noneedtorecompileorre-linkü Analyzeandmodifycodeatruntime

5

Dynamic Binary Instrumentation• Intel PIN• Valgrind• QEMU

6

Intel Pin Capability• Binary Analysis:

Ø TraceControlFloworDataFlowØ Hookfunction,signalsandsystemcallØ Multi-Threadsupport

• Change program behavior:Ø Add/deleteinstructions/basicblocks/functionsØ ChangeregistervaluesØ ChangecontrolflowØ Changememoryvalues

7

Starting at first application IP Read a Trace from Application CodeJit it, adding instrumentation code from inscount.dllEncode the trace into the Code CacheExecute Jitted code

Execution of Trace endsCall into PINVM.DLL to Jit next tracePass in app IP of Trace’s target

Source Trace exit branch is modified to directly branch to Destination Trace

Pin Work Flow Demonstrationgzip.exe input.txt

Application Code andData

Application Process

System Call Dispatcher

Event Dispatcher Thread Dispatcher

PINVM.DLL

inscount.dll

PIN.LIB

Code Cache

NTDLL.DLL

Windows kernel

CreateProcess (gzip.exe, input.txt, suspended)

Launcher

PIN.EXE

Launcher Process

Boot Routine +Data:firstAppIp,“Inscount.dll”

Load PINVM.DLL

Inject Pin BootRoutine and Data into application

Load inscount.dll and run its main()

Start PINVM.DLL running(firstAppIp, “inscount.dll”)

pin.exe –t inscount.dll – gzip.exe input.txtCount 258743109

PinTool that counts application instructions executed, prints Count at end

Resume at BootRoutine

First app IP

app Ip of Trace’s target

Read a Trace from Application CodeJit it, adding instrumentation code from inscount.dllEncode the jitted trace into the Code Cache

GetContext(&firstAppIp)SetContext(BootRoutineIp)WriteProcessMemory(BootRoutine, BootData)

Decoder

Encoder

How to instrumentation

9

Insertcallback function forinstructions,basicblocks,functionsandimage.e.g.,Instruction Instrumentation

How to instrumentation

10

How to Pass Parameters

11

INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)ifun, IARG_TYPE, IARG, …… IARG_END);

IARG_TYPE:Ø IARG_ADDRINTØ IARG_PTRØ IARG_BOOLØ IARG_UINT32Ø IARG_UINT64Ø IARG_INST_PTRØ IARG_REG_VALUEØ IARG_REG_REFERENCEØ IARG_REG_CONST_REFERENCEØ ……

IARG:Ø INS_Address(ins)Ø INS_OperandReg(ins, 0)Ø INS_MemoryOperandCount(

ins)Ø INS_Valid(ins)Ø ……

Instrumentation Granularity:• Instructioninstrumentation• Basicblockinstrumentation

Ø Asequenceofinstructionsterminatedatacontrol-flowchanginginstructionØ Singleentry,singleexit

• TraceinstrumentationØ Asequenceofbasicblocksterminatedatanunconditionalcontrol-flowchanginginstruction

Ø Singleentry,multipleexits• Routineinstrumentation• Imageinstrumentation

APIs:

12

Compare with Trace and Basic Block

13

Taint Analysis

18

• Classify of taint analysis• Basic Concept• Introduction• Byte or bit • Shadow Memory• Dynamic Taint Analysis

Classify Of taint Analysis

• StaticTaintAnalysisØ Theadvantageofusingstaticanalysisisthefactthatitprovidesbettercodecoveragethandynamicanalysis.

Ø Ontheotherhand,theprincipaldisadvantageofthestaticanalysisisthatit'snotasaccuratethanthedynamicanalysis- Itcannotaccesstheruntimeinformationforexample.Wecan'tretrieveregistersormemoryvalues.

• DynamicTaintAnalysisØ Dynamicanalysiswecan'tcoverallthecodebutyouwillbemorereliable.

19

Basic concept

20

Taintpropagation:Ø TaintIfanoperationusesthevalueofsometainted object,sayX,toderiveavalueforanother,sayY,thenobjectYbecomestainted.ObjectXtaintedtheobjectY

Taint propagation

21

Basic concept

22

• TaintSources: program,ormemorylocations,wheredataofinterestenterthesystemandsubsequentlygettagged.Fortheconvenienceofdescription,weusetheuserinputasthetaintsourceinthiscourse.

• TaintTracking: processofpropagatingdatatagsaccordingtoprogramsemantics

• TaintSinks: program,ormemorylocations,wherechecksfortaggeddatacanbemade

Introduction

23

Taintanalysisisusedtoknowataprogrampointwhatpartofmemoryorregisterarecontrollablebythesomedataweareinterested,forexample:userinput.

Accordingtotheinstructionsemanticsthetaintisspreadovertheexecution.

Introduction

24

Forexampleseethefollowingcode.

Intheexample1,atthebeginning,the'a'and'b'variablesarenottainted.Whentheatoifunctioniscalledthe'a'variableistainted.Then'b'istaintedwhenassignedbythe'a'value.Nowweknowthatthefoo2functionargumentcanbecontrolledbytheuser.

Introduction

25

Intheexample2,whenthebufferisallocatedviamallocthecontentisnottainted.Thenwhentheallocatedareaisinitiazliedbyuserinputs,weneedtotaintthebytes'buffer+2','buffer+12'and'buffer+30'.Later,whenoneofthosebytesisread,weknowitcanbecontrolledbytheuser.

Byte or bit ?

26

Oneoftheseproblems istodeterminewhatmethodisthemoreaccuratetodoataintwithagreatprecision.Forexample,whatarewesupposedtodowhenacontrolledbyteismultipliedandstoredsomewhereinmemory?Shouldwetaintthedestinationvariable?Seethefollowingcode.

call atoi@pltmov eax,edxcmp eax,$0jse nextcmp eax,$4jne nextshl eax,0x3sub eax,edxmov eax,DWORDPTR[rbp-0x4]next:mov DWORDPTR[rbp-0x4],eaxleaveret

Byte or bit ?

27

Inthepreviouscode,wecancontrolonly5bitsofthevariable'num';notthewholeinteger.So,wecan'tsaythatwecontrolthetotalityofthisvariablewhenitisreturnedandusedsomewhereelse.

Byte or bit ?

28

Bytetaintanalysisassertbistainted.Bittaintanalysisassertbis

nottainted.

Byte or bit ?

29

So,whattodo?Taintingbytesiseasierandlightortaintingbitscontrolledbytheuser?Ifyoutaintbytes,itwillbeeasierbutnot reliable.Ifwetaintbits,itwillbeharderandmoredifficultto

managethetainttreebutitwillbe99%reliable.

Taintbytesisenoughformostsituation.

Dynamic Taint Analysis

30

Howtodothedynamictaintanalysis?

Dynamic Taint Analysis

31

Inordertodothis,weneedadynamicbinaryinstrumentation(DBI)framework.ThepurposeoftheDBIistoaddapre/posthandleroneach

instruction.Whenahandleriscalled,youareabletoretrievealltheinformationyouwantabouttheinstructionortheenvironment(memory).

WechoosetousePin:aC++dynamicbinaryinstrumentationframework(withoutIR)writtenbyIntel.

32

Weusershadowmemorytomarkalladdresscanbetaintedbyoriginatedataweinterested.

Shadow Memory

Shadow Memory

33

• ShadowMemory: Shadowmemorydescribesacomputersciencetechniqueinwhichpotentiallyeverybyteusedbyaprogramduringitsexecutionhasashadowbyteorbytes.

• Theseshadowbytesaretypicallyinvisibletotheoriginalprogramand areusedtorecordinformationabouttheoriginalpieceofdata.

Shadow Memory

34

• ShadowMemoryØ Weneedamapping

ü Addr →AbstractStateü Register→Abstract

Shadow Memory

35

• ShadowMemoryØ Weneedamapping

ü Addr →AbstractStateü Register→Abstract

Shadow Memory

36

• ShadowMemoryØ Weneedamapping

ü Addr →AbstractStateü Register→Abstract

Shadow Memory

37

• ShadowMemoryØ Weneedamapping

ü Addr →AbstractStateü Register→Abstract

Shadow Memory

38

• ShadowMemoryØ Weneedamapping

ü Addr →AbstractStateü Register→Abstract

Dynamic taint Analysis

39

Firstlyweneedtodeterminatealluserinputslikeenvironment andsyscalls.Webegintotainttheseinputsandwespread/removethetaintwhenwehaveinstructionslikeGET/PUT,LOAD/STORE.

Dynamic Taint Analysis

40

• Forthisfirstexample,wearegoingtotaintthe'read'memoryareaandwewillseeabriefoverviewofthePinAPI.Forthisfirsttestwewill:Ø Catchthesys_read syscall.Ø Getthesecondandthethirdargumentfortaintarea.Ø CallanhandlerwhenwehaveaninstructionlikeLOADorSTOREinthisarea.

Ø Spreadthetaint.

Catch the syscalls

41

Whenasyscall occurs,wewillcheckifthesyscall isread.Then,wesavethesecondandthirdargumentwhichdescribeourmemoryarea.Thesecondargumentisthestartofmemoryaddresswhich

thesyscall iswritingto.Thethirdargumentisthelengthofdatatowritetothe

memeory.

Catch the syscalls

42

Catch the LOAD and STORE instructions

43

Nowweneedtocatchallinstructionsthatread(LOAD)orwrite(STORE)inthetaintedarea.Todothat,wewilladdafunctioncalledeachtimeanaccesstothisareaismade.

Catch the LOAD and STORE instructions

44

Hook Load Instruction

45

Hook Store Instruction

46

Spread the taint

47

ImagineyouLOADavalueinaregisterfromthetaintedmemory,thenyouSTOREthisregisterinanothermemorylocation.Inthiscase,weneedtotainttheregisterandthenewmemorylocation.Sameway,ifaconstantisSTOREDinthememoryarea

tainted,weneedtodeletethetaintbecausetheusercan'tcontrolthismemorylocationanymore.

Spread the taint

48

ImagineyouLOADavalueinaregisterfromthetaintedmemory,thenyouSTOREthisregisterinanothermemorylocation.Inthiscase,weneedtotainttheregisterandthenewmemorylocation.Sameway,ifaconstantisSTOREDinthememoryarea

tainted,weneedtodeletethetaintbecausetheusercan'tcontrolthismemorylocationanymore.

Spread the taint

49

Spread the taint

50

Taint analysis for security

• DetectOverflow-Return-Address

51

Detect overflow-Return-Value

• Howtocheckifthereturnaddressisoverflowed?

• Howtogettheespvaluepointedtoreturnaddress?

Check the return address

pop %espesp

Beforeeveryreturn

if %esptainted

ret addroverflowed

Get esp value

getcpu context getesp valueinstrument checktheespvalue

Example

Reference:PIN introduce[1]https://software.intel.com/sites/landingpage/pintool/docs/71313/Pin/html/index.html

PIN�API���[2]https://software.intel.com/sites/landingpage/pintool/docs/71313/Pin/html/group__PIN__SYSCALL__API.html

PINtool���[3]https://software.intel.com/en-us/articles/pin-a-binary-instrumentation-tool-downloads

[4]FreeSentry:ProtectingAgainstUse-After-FreeVulnerabilitiesDuetoDanglingPointers

Recommended