33
Chapter 2 Low Level Software Contents Low-Level Perspectives............................................... 3 Low-Level Data Management...........................................3 Register..........................................................4 The Stack.........................................................4 Heap..............................................................7 Executable Data Sections..........................................7 Control Flow........................................................8 Assembly Languge 101................................................. 8 Registers...........................................................8 Flags..............................................................10 Instruction Format.................................................11 Basic Instruction..................................................12 Moving Data......................................................12 Arithmetic.......................................................12 So sánh..........................................................13 Conditional Branches.............................................13 Function Calls...................................................14 Example..........................................................14 A Primer on Compilers and Compilation...............................16 Định nghĩa Compiler................................................16 Kiến trúc Comiler..................................................16 Front End........................................................17 Intermediate Representations.....................................17 Optimizer........................................................18 Back End.........................................................19 Listing Files......................................................20 Specific compiler..................................................20

Chapter 02 Low Level Software

Embed Size (px)

DESCRIPTION

Low Level Software

Citation preview

Chapter 2 Low Level SoftwareContentsLow-Level Perspectives3Low-Level Data Management3Register4The Stack4Heap7Executable Data Sections7Control Flow8Assembly Languge 1018Registers8Flags10Instruction Format11Basic Instruction12Moving Data12Arithmetic12So snh13Conditional Branches13Function Calls14Example14A Primer on Compilers and Compilation16nh ngha Compiler16Kin trc Comiler16Front End17Intermediate Representations17Optimizer18Back End19Listing Files20Specific compiler20Execution Environments20Software Execution Environment (Virtual Machine)21Bytecode21Interpreters22Just-In-Time Compilers.22Reversing Strategies22Hardware Execution Environments in Modern Processors23Intel NetBurst23Micro-Ops ops23Pipelines24Branch Prediction24

Low-Level PerspectivesS phc tp trong reversing pht sinh khi chng ta c gng to ra mt mi lin h trc quan gia cc khi nim high-level m t mc trc v low-level perspective ta nhn c khi nhn vo binary ca chng trnh. Phn ny s m t lm th no biu din cc khi nim lp trnh c bn nh data structures, control flow trong low-levels.Low-Level Data ManagementMt trong nhng khc bit quan trng nht gia cc ngn ng lp trnh bc cao v bt k loi biu din mc thp no ca chng trnh l trong data management. Cc ngn ng lp trnh bc cao che du i rt nhiu chi tit v data management. Cc ngn ng khc nhau c mc che du khc nhau, nhng ngay c plain ANSI C ( c xem l mt ngn ng tng i low-level trong cc ngn ng bc cao) cng che du i rt nhiu chi tit v vic qun l d liu. V d, xem xt on code C sau:intMultiply(intx,inty){intz;z=x*y;returnz;}

Hm ny trng c v n gin, nhng khng th dch trc tip sang low-level representation. CPU him khi c instructions khai bo mt bin, hoc nhn hai bin vi nhau ri gn kt qu cho mt bin th 3. Gii hn v phn cng v nhng cn nhc v hiu nng gii hn mc phc tp m mt single instruction c th thc hin. Mc d IA-32 CPUs h tr rt nhiu instructions, hu ht cc instructions vn rt th s so vi cc lnh trong ngn ng bc cao.V th low-level representation ca hm Multiply thng phi quan tm n nhng vic sau:1. Lu trng thi ca my trc khi chy function code.2. Cp pht b nh cho bin z3. Load cc tham s x, y t memory vo register4. Nhn x vi y v lu kt qu trong mt register5. C th chp kt qu php nhn ngc li vng nh cp cho z6. Khi phc li trng thi ca my trc khi gi hm7. Return to caller vi z l return value.S phc tp ny l kt qu ca vic phi xem xt n low-level data management. Mc sau y s gii thiu v cc low-level data management constructs ph bin nht nh register, stack, heap v mi lin quan ca chng vi cc khi nim mc cao nh bin v tham s.Vic qun l d liu mc thp rt khc vi mc cao ch yu do tc thc thi cao ca CPU.Trong cc my tnh hin i, CPU gn vi RAM bng mt kt ni tc cao (bus). Nhng tc ca CPU cao hn nhiu so vi RAM. Khi CPU yu cu c/ghi vo memory, thi gian lnh n c memory chip, c x l, ri phn hi li ln hn nhiu mt single CPU clock cycle. iu ny ngha l CPU phi lng ph nhiu clock cycles ch i RAM.Do cc instruction x l trc tip cc ton hng da trn memory s chm hn v nn trnh bt c khi no c th.Register trnh phi truy cp RAM vi mi instruction, CPU dng internal memory, c th truy cp trc tip m khng c performance penalty. Internal memory ng quan tm nht l register.Nhc im ca register l ta ch c vi ci. V d, IA-32 processor ch c 8 32-bit register l tht s generic. Cng c nhiu register khc nhng hu ht chng c dng cho cc mc ch c bit, v khng th lc no cng c th dng c. Assembly language code xoay quanh cc register bi v chng l cch d nht processor quan l v truy cp trc tip d liu. Nhng registers khng c dng lu tr lu di, y l vai tr ca RAM.Qun l register v loading / storing d liu t RAM ti register hoc ngc li lm cho assembly language code thm phc tp.Quay tr li on code trn, hu ht s phc tp xoay quanh vic qun l d liu. x v y khng th c nhn trc tip t memory, the code s phi c mt s vo register, sau nhn register vi mt gi tr khc vn ang trong RAM. Mt hng tip cn khc l load c hai gi tr vo hai register, sau mi nhn, nhng c l khng cn thit.Register cng c dng lu tr di hn mt s gi tr. Bi v truy cp register rt d, compiler dng register caching cc gi tr thng xuyn s dng trong phm vi mt hm, v lu cc bin cc b c nh ngha trong source code ca chng trnh.Trong khi reversing, ta phi pht hin bn cht ca gi tr c lu trong register. Nu thy mt register c s dng lp i lp li v c update xuyn sut mt hm, th y l du hiu cho thy register ny c dng lu mt bin cc b c nh ngha trong source code.

___________________________________The Stackiu g xy ra bc 2 khi chng trnh cp pht khng gian lu tr cho bin z. iu ny ph thuc vo compiler. Ni chung, gi tr ny c t trong register hoc trong stack. Nu t trong register, bc 4 n gin t kt qu vo allocated register. Nhng nu khng cn register kh dng, ta phi t d liu trn RAM. Trong nhng trng hp nh vy, bin c t trn stack.Stack l mt vng b nh chng trnh c dng lu tr ngn hn thng tin cho CPU v program. N c th xem nh vng lu tr thng tin th cp cho cc thng tin ngn hn. Register c dng lu tr the most immediate data, v stack c dng lu tr thng tin di hn hn cht. V mt vt l, stack ch l mt vng nh trong RAM c cp pht cho mc ch ny. Stack thng tr trong RAM ging bt k d liu no khc, s phn bit ch l v mt logic. HH qun l nhiu stack cng mt thi im - mi stack biu din mt curently active program or thread. Chng 3 tho lun v thread v cch cp pht, qun l stack.Memory for stacks thng c cp pht top down. Cc a ch cao nht c cp pht v c s dng trc tin, v stack pht trin v pha cc a ch thp.Figure 2.1 minh ha stack trng nh th no sau khi push vi gi tr ln stack:

Figure 2.2 Stack sau khi poping mt vi gi tr.

V d dng stack l bc 1 v 6. Machine state cn lu tr thng l gi tr ca cc register. Gi tr ca cc register s c lu vo stack, sau s c load ra t stack vo cc register tng ng.Stack c th c dng cho mt s mc ch: Lu tr tm thi gi tr ca cc register : Stack thng xuyn c dng lu tr tm thi gi tr ca cc register, sau khi phc li cc gi tr ny vo trong cc register . iu ny c th c dng trong mt s tnh hung : -- nh khi mt procedure va c gi cn dng ti mt s thanh ghi no , n phi lu tr li gi tr ca cc thanh ghi m bo rng n khng corrupt bt k register no c dng bi caller ca n. Local Variables: Bin cc b khng fit trong register s c lu tr trn stack, stack cng c th lu tr cc bin cn lu tr trn RAM (c nhiu l do ti sao phi lu tr bin ny trn RAM, nh khi ta mun gi mt hm, v bt n ghi gi tr vo mt bin cc b c nh ngha trong hm hin ti). Ch rng khi x l cc bin cc b, d liu khng c pushed v popped onto stack, m stack c truy cp s dng offset, ging data Structures. iu ny s c minh ha khi ta enter the real reversing sessions, trong part II. Function parameters and return addresses: Stack c dng ci t function calls. Trong mt function call, caller hu nh lun lun truyn tham s cho callee v chu trch nhim lu tr con tr lnh hin ti vic thc thi c th tip tc t v tr hin ti, sau khi callee hon thnh. Stack c dng lu tr c parameters v instruction pointer cho mi procedure call.

HeapHeap l mt managed memory region cho php cp pht ng cc block of memory c kch thc bt k khi chng trnh ang thc thi. Mt chng trnh requests mt block c kch thc c th, v nhn c mt pointer ti block va mi c cp pht. (gi s ta c b nh). Heaps c qun l bi cc th vin phn mm c ship km chng trnh hoc bi HH.Heaps thng c dng cho cc variable-sized object c s dng bi chng trnh, hoc cc object qu ln, khng t trn stack c. Vi reverser, xc nh c heaps trong memory v nhn dng ng n cc on chng trnh cp pht v gii phng heap c th c ch, v n gp phn hiu cch b tr d liu ca chng trnh. V d nu thy mt li gi ti ci m ta bit l mt heap allocation routine, ta c th ln theo lung s dng return value ca procedure trong chng trnh xem iu g c thc hin vi block c cp pht, vv. Ngoi ra bit chnh xc kch thc ca object c cp pht trn heap (block size lun c truyn lm tham s ca heap allocation routine) l mt gi nh khc v tng quan chng trnh.Executable Data SectionsMt vng khc trong program memory thng c dng lu tr application data l "executable data section". Trong ngn ng bc cao, vng ny thng cha bin ton cc hoc d liu c khi to sn (preinitialized data). Preinitialized data l bt k loi constant, hard-code information no i km chng trnh. Mt vi preinitialized data c nhng ngay trong code (nh hng s nguyn), nhng khi c qu nhiu d liu, trnh bin dch lu n trong mt vng c bit trong program executable v sinh code tham chiu ti n theo a ch. Mt v d v preinitialized data l bt k loi hard-coded string no trong mt chng trnh. Sau y l mt v d ca loi strings nh vy:char*szWelcome="Thisstringwillbestoredintheexecutable'spreinitializeddatasection";

Khai bo trong C ny s khin trnh bin dch lu tr string trong "executable's preinitialized data section", khng cn bit szWelcome c khai bo u trong code. Ngay c nu szWelcome l bin cc b c khai bo bn trong mt hm, string ny vn c lu tr trong "preinitialized data section". truy cp ti string ny, compiler s emit mt hard-coded address tr ti string ny. Ci ny d dng nhn ra trong khi reversing mt chng trnh, bi v hard-coded memory address him khi c dng cho vic no ngoi vic tr ti mt executable's data section.Mt trng hp ph bin khc m d liu c lu trong "executable's data section" l khi chng trnh khai bo mt bin ton cc. Bin ton cc cung cp s lu tr di hn (gi tr ca chng tn ti trong sut life of the program), v c th truy cp t bt c u trong chng trnh, v th gi l global. Trong hu ht cc ngn ng, bin ton cc c khai bo bn ngoi cc nh ngha hm. Nh vi preinitialized data, compiler phi dng hard-coded memory addresses truy cp ti bin ton cc, do d dng nhn ra khi reversing mt chng trnh.Control FlowControl Flow l mt trong nhng phn m source code trng user-friendly hn nhiu. Processor v low level language khng hiu ngha ca cc t if, while ... Nhn vo low level implementation ca mt simple control flow statement rt d ri, bi v cu trc control flow dng trong low level language kh th s. Kh khn ch chuyn nhng cu trc th s ny ngc li thnh khi nim mc cao, thn thin vi ngi dngHu ht high-level conditional statement qu di i vi low level language nh assembly language, v th chng c chia nh thnh mt chui thao tc. Mu cht hiu nhng chui ny, mi tng quan gia chng, v cc high-level statement sinh ra chng, l hiu cc low-level control flow constructs v cch chng c dng biu din high level control flow statements. Chi tit ca low-level constructs l ph thuc plaftform v ngn ng; ta s tho lun control flow statement trong assembly language.Assembly Languge 101 hiu low-level software, ta phi hiu assembly language. Assembly language l ngn ng ca reversing, lm ch n l bc u tin tr thnh mt real reverser, bi v vi hu ht cc chng trnh, assembly language l link duy nht vi source code ban u.Ta tp trung vo IA-32.RegistersIA-32 register s c tham chiu ti trong hu nh mi assembly language instruction. IA-32 c 8 generic register : EAX, EBX, ECX, EDX, ESI, EDI, EBP, ESP.Ch rng tn ca tt c cc register ny bt u vi ch E, vit tt ca extended.Table 2.1 Generic IA-32 RegisterEAX, EBX, EDXy l tt c cc generic registers c th dng cho bt k integer, Boolean, logical, hoc memory operation no

ECXGeneric, thnh thong c dng lm counter trong cc lnh lp dng counter

ESI/EDIGeneric, thng dng lm source/destination pointers trong cc instruction sao chp b nh (SI = Source Index; DI = Destination Index)

EBPC th dng nh mt generic register, nhng hu nh c dng lm stack base pointer. S dng t hp base pointer v stack pointer to ra mt stack frame. Mt stack frame c th c nh ngha nh l stack zone ca hm hin ti, nm gia stack pointer (ESP) v base pointer (EBP). Base pointer thng tr ti v tr trong stack ngay sau return address ca hm hin ti. Stack frame c dng truy cp nhanh ti cc bin cc b, v cc tham s c truyn cho hm hin ti

ESPy l CPUs stack pointer. Stack pointer lu v tr hin ti trong stack, v th bt c ci g c pushed vo stack s c pushed di a ch ny, v thanh ghi ny s c cp nht tng ng.

Figure 2.3 General-purpose register in IA-32

FlagsIA-32 Processor c mt register c bit gi l EFLAGS cha tt c cc loi status v system flags. System flags c dng qun l cc processor modes and states, v khng lin quan n RE. Status flags, c processor dng ghi li logical state hin ti ca n, v c update bi nhiu logical and integer instructions ghi li kt qu hnh ng ca chng. Ngoi ra cng c cc instructions hot ng da trn gi tr ca cc status flags nyTrong IA-32 code, flags l cng c c bn to ra conditional code. C cc lnh s hc kim tra xem ton hng ca n c tha mn mt iu kin no khng, v set processor flags da trn kt qu kim tra. Sau c cc instruction khc c cc flags ny v thc hin cc operation khc nhau, ph thuc vo gi tr ca flags. Mt nhm instructions ph bin hot ng da trn gi tr ca cc flags l Jcc (Conditional Jump) instruction, chng s kim tra gi tr ca mt flags no (ph thuc vo instruction c th c gi) v jump ti mt specified code address nu the flags are set according to the specific conditional code specified.V d dng flags to ra conditional statement nh trong high level language. Gi s ta c mt bin bSuccess trong ngn ng bc cao v on code kim tra xem n c false hay khng:if(bSuccess==FALSE)return0;

Vit dng ny trong assembly language th no ? Ni chung ta khng th va kim tra gi tr ca mt bin v thc hin mt hnh ng da trn kt qu kim tra ch trong mt single instruction hu ht cc instruction qu th s lm th. Thay vo ta phi kim tra gi tr ca bSuccess (gi tr ny phi c load vo mt register trc), set mt vi flags ghi li kt qu xem n c bng 0 hay khng, v gi mt conditional branch instruction kim tra cc flags cn thit v branch nu chng cho thy the operand c x l trong instruction gn nht l zero (iu ny c bo hiu bng Zero Flag, ZF). Nu khng th processor s ch tip tc chy instruction ngay tip sau branch instruction. Instruction FormatInstructions thng bao gm mt opcode (operation code), v mt hoc hai operands (ton hng). Opcode l tn ca instruction nh MOV, v operands l cc tham s m instruction nhn (c nhng instruction khng c ton hng). Operand biu din d liu c x l bi mt instruction c th (ging nh tham s c truyn cho hm), v c mt trong 3 dng : Register name : Tn ca thanh ghi cn c hay ghi vo, nh EAX, EBX Immediate: Mt constant value c nhng ngay trong code. iu ny thng bo hiu rng c mt s hng c hard-coded trong chng trnh gc Memory address : nu operand c lu trong RAM, memory address ca n s c bc trong du [] bo rng n l mt memory address. a ch ny c th l : hard-coded immediate, ni cho processor bit chnh xc a ch cn c hoc ghi mt register , gi tr ca register ny s c dng nh mt memory address. Mt register + hng s, register s biu din base address ca mt object, hng s biu din mt offset trong object , hoc mt index trong mt mng

nh dng chung ca mt instruction nh sau:Instruction_name Destination_Operand, Source_Operand

Mt s instruction ch c mt operand, mt s khc khng c operand no.V d:Table 2.2 Examples of Typical Instruction Operands and Their MeaningsOperandM t

EAXTham chiu ti EAX, hoc c, hoc ghi

0x30004040Mt immediate number c nhng trong code (ging nh mt hng)

[0x4000349e]Mt immediate hard-coded memory address y c th l truy cp ti mt bin ton cc

C php assembly language c m t y l theo Intel, n khng phi l k php duy nht dng biu din IA-32 assembly language code. Trong k php AT&T, source operand thng ng trc destination operand (ngc li vi k php Intel). Tn ca register c thm k t % pha trc (EAX s c gi l %EAX). Memory address c biu th bng du (), v th %(EBX) ngha l a ch m EBX tr ti. K php AT&T c dng trong h UNIX nh GNU tools, k php Intel c dng chnh trong Windows tools. Quyn sch ny dng k php IntelBasic InstructionSau y l cc instruction ph bin nht, xut hin khp ni trong mt chng trnh.Moving DataMOVDestination,SourceMOV chc chn l instruction ph bin nht, n moves data t source ti destination.Destination c th l mt memory address (l mt immediate hay dng mt register) hoc mt register.Source c th l mt immediate, register, memory address.Ch rng ch mt trong s cc operands ny c th cha mt memory address, ch khng th c hai. y l qui tc chung trong IA-32 instructions: vi mt vi ngoi l,hu ht cc instruction ch c th cha mt memory operand. ArithmeticTp lnh IA-32 cha 6 lnh s hc s nguyn c bn : ADD, SUB, MUL, DIV, IMUL, IDIV.Bng sau ch lit k cc dng c bn nht ca mi instruction. Nhiu instruction trong s ny c nhiu configuration khc nhau, vi cc tp ton hng khc nhau. Ta ch lit k configuration ph bin nht vi mi instructionInstructionM t

ADD Operand1, Operand2Cng hai s nguyn c du hoc khng du. Kt qu thng c lu trong Operand1Operand1 = Operand1 + Operand2

SUB Operand1, Operand2Tr hai ton hng c du hoc khng duOperand1 = Operand1 Operand2

MUL OperandNhn the unsigned operand vi EAX, v lu kt qu l gi tr 64-bit trong EDX:EAX, ngha l 32 bit thp c lu trong EAX, cn 32 bit cao c lu trong EDX

DIV OperandChia the unsigned 64-bit value c lu trong EDX:EAX cho unsigned operand. Lu thng s trong EAX, s d trong EDX

IMUL OperandNhn the signed operand vi EAX v lu kt qu trong mt gi tr 64-bit EDX:EAX

IDIV OperandChia the signed 64-bit value c lu trong EDX:EAX cho the signed operand. Lu thng trong EAX v s d trong EDX

So snhCMPOperand1, Operand2Kt qu c ghi li trong processors flags.V c bn, CMP ch n gin subtracts Operand2 from Operand1 v xa kt qu trong khi thit lp gi tr cc flags lin quan phn nh kt qu. V d, nu kt qu php tr l zero, the Zero Flag (ZF) c set, bo hiu rng hai ton hng bng nhau. Nu hai ton hng khng bng nhau, tip tc dng cc c khc xc nh s no ln hn.

Conditional BranchesConditional Branches c ci t bng Jcc group of instruction. l cc lnh r nhnh c iu kin ti mt a ch c th, da trn mt iu kin no . Jcc ch l mt ci tn chng, v cc bin th khc nhau kh nhiu. Mi bin th kim tra mt tp cc flags khc nhau quyt nh xem c r nhnh hay khng. Cc bin th c th c tho lun trong Apependix A.Dng c bn ca mt conditional branch insructions nh sau:Jcc TargetCodeAddressNu iu kin c th tha mn, Jcc s ch update the instruction pointer ti TargetCodeAddress (m khng lu li gi tr hin ti ca n). Nu iu kin khng tha mn, Jcc s chng lm g c, v lnh tip sau s c chyFunction CallsFunction calls c ci t dng 2 instruction c bn. CALL instruction gi mt hm, v RET instruction return to the caller.CALL s push ni dung hin ti ca instruction pointer vo stack ( sau c th tr v ti caller, v khi lnh CALL c gi th ni dung ca instruction pointer s l a ch ca lnh tip theo ngay sau lnh CALL) v nhy ti mt a ch xc nh. a ch ca hm c th c ch nh ging nh bt k ton hng no khc, nh mt immediate, register, hoc memory address. Ni chung lnh CALL nh sau:CALL FunctionAddress

Khi mt function chy xong, cn return v caller ca n, n thng gi RET. RET pop a ch m CALL push vo stack, ri gn vo instruction pointer v resumes execution t a ch . Ngoi ra, RET c th s tng ESP ln mt s byte c th sau khi popping the instruction pointer. iu ny l cn thit khi phc ESP li v v tr ban u ca n trc khi hm hin ti c gi v bt k tham s no c pushed vo stack. Trong mt vi calling conventions the caller chu trch nhim iu chnh ESP, ngha l trong trng hp RET s c dng m khng c ton t no, v caller s phi th cng tng ESP ln mt lng bng s byte ca tham s. Chi tit xem Appendix C.

ExampleHy xem qua mt vi on m assembly chc rng ta hiu cc khi nim c bn. V d u tin:cmp ebx, 0xf020jnz 10026509

Lnh u tin l CMP, so snh gi tr hin ti ca EBX vi hng s 0xf020 , hay 61, 472. CMP s set mt s flag no phn nh kt qu ca php so snh. JNZ l mt phin bn ca Jcc (conditional branch). Lnh ny s r nhnh nu zero flag (ZF) khng c set, l l do lnh ny c gi l JNZ (Jump if Not Zero). iu ny ngha l lnh ny s nhy ti mt a ch ca mt on code c th nu hai ton hng trong php so snh khng bng nhau. JNZ cn c gi l JNE (Jump if Not Equal). JNE v JNZ l hai mnemonics cho cng mt instruction, chng tht ra c cng opcode trong machine language.V d tip theo c move data v vi php ton s hc:MOV edi, [ecx+0x5b0]MOV ebx, [ecx+0x5b4]IMUL edi, ebx

Lnh MOV u tin c d liu mt a ch nh vo trong thanh ghi EDI. Du [] cho bit y l mt memory access, v a ch c th cn c d liu l biu thc trong du []. Trong trng hp ny, MOV s ly gi tr ca ECX, sau cng vi 0x5b0, kt qu ca php cng ny ra mt a ch nh. Lnh ny s c 4 byte t a ch v ghi vo EDI. Ta bit rng 4 byte sp c c v da vo vic thanh ghi c ch nh lm destiantion operand. Nu instruction ny tham chiu ti DI thay v EDI, ta s bit rng ch 2 byte sp c c. EDI l mt full 32-bit register (Figure 2.3)Lnh tip theo c d liu t mt a ch nh khc, ln ny l [ecx+0x5b4] vo thanh ghi EBX. Ta c th d dng suy lun ra rng ECX ang tr ti mt loi cu trc d liu no . 0x5b0 v 0x5b4 l cc offset ti cc member trong data structure . Nu y l mt real program, bn c th th v tm ra nhiu thng tin hn v cu trc d liu ny. Ta c th ln ngc li trong code xem ECX c load gi tr hin ti ch no, t ta bit a ch ca cu trc ny nhn c t u, v c th h l mt vi nh sng v bn cht ca data structure ny. Ti s minh ha tt c cc loi k thut khm ph ra cc cu trc d liu trong cc v d reversing trong sut quyn sch.Instruction cui cng trong chui ny l mt lnh IMUL (signed multiply). IMUL c vi dng khc nhau, nhng khi c ch nh vi hai ton hng nh y, th n ngha l ton hng 1 s c nhn vi ton hng 2, ri lu kt qu vo ton hng 1. Ngha l EDI = EDI * EBXNhn vo ton b 3 dng m ny, ta c th c c mt tng tt v mc ch ca chng. Chng ly hai member khc nhau ca cng mt cu trc d liu (c a ch trong ECX) v nhn chng vi nhau. V IMUL c dng, ta bit rng hai members ny l cc signed integers, c v nh di 32 bit. Khng qu ti cho 3 dng code assembly language !Trong v d cui, hy xem mt average function call sequence trong nh th no trong IA-32PUSH eaxPUSH ediPUSH ebxPUSH esiPUSH dword ptr [esp+0x24]call 0x10026eeb

5 gi tr c t vo trong stack bng lnh PUSH. 4 gi tr u tin c ly t cc thanh ghi. Lnh th 5 gi tr c ly t mt a ch [esp+0x24] . Trong hu ht cc trng hp y l mt a ch stack (ESP l stack pointer), gi rng a ch ny hoc l mt tham s c truyn ti hm hin ti hoc l mt bin cc b. xc nh chnh xc a ch ny biu din ci g, bn s cn phi nhn vo ton b hm v kho st xem n dng stack ny nh th no. K thut ny s c minh ha trong Chapter 5.

A Primer on Compilers and Compilation99% modern software c ci t bng high-level language, v i qua mt vi compiler trc khi c ship ti khch hng. V th hu ht cc tnh hung reversing bn bt gp s cha nhng thch thc ca vic gii m the backend output ca mt compiler no .V th hiu v compiler v cch lm vic ca chng s c ch. C th xem iu ny nh kiu bit ch bit ta, n gip ta hiu v i ph vi nhng kh khn lin quan n vic gii m code c trnh bin dch sinh ra.Code do compiler sinh ra kh c. Thnh thong n khc on code gc to nn chng trnh n ni kh m nhn ra mc ch ban u ca ngi lp trinh. iu tng t xy ra vi cc biu thc s hc, chng thng c sp xp li thc hin hiu qu hn, v kt qu cui cng l to ra mt chui tnh ton s hc rt kh hiu. Hiu qu trnh c thc hin bi trnh bin dch v cch chng xem xt code cui cng s gip gii m output ca chng.Sau y s ni v compiler, cch chng hot ng, cc pha khc nhau c thc hin bn trong mt compiler thng thng. Reverser phi tht s bit v h thng ca h, khng th hiu v h thng m khng hiu cch phn mm c to ra v built.Compiler l cc chng trnh cc k phc tp, t hp nhiu lnh vc nghin cu khc nhau trong KHMT va c th cha hng triu dng code. y ta ch i lt qua b mt ca chng. Mun o su hn c th tm c:[Cooper] Keith D. Copper and Linda Torczon. Engineering a Compiler. Morgan Kaufmann.

nh ngha Compiler mc c bn nht, compiler l mt chng trnh bin i chng trnh t dng biu din ny sang dng biu din khc. Trong hu ht cc trng hp input l mt text file cha code tun theo c t ca mt ngn ng bc cao no . Output l biu din low-level ca cng chng trnh. Dng low-level nh vy thng cho phn cng hoc phn mm c, ch him khi cho ngi c. Trnh bin dch bin i chng trnh t dng bc cao, d c cho ngi sang dng mc thp, d c cho my.Trong sut qu trnh bin dch, compiler s i qua rt nhiu bc ti u ha hoc ci tin chng trnh, tn dng nhng hiu bit ca compiler v chng trnh v trin khai nhng gii thut khc nhau ci tin hiu qu ca code. Qu trnh ti u ha ny to ra mt side effect : Lm suy gim nghim trng s d c ca code, n gin v code do compiler sinh ra khng phi nhm cho ngi c.Kin trc ComilerTrnh bin dch thng cha 3 thnh phn c bn. The Front End chu trch nhim gii m original program text v m bo rng c php l ng n v tun theo c t ngn ng. The Optimizer ci thin chng trnh theo cch ny hay cch khc, trong khi gi nguyn ngha ban u ca n. Cui cng, the Backend chu trch nhim sinh plaftform-specific binary t code c ti u do optimizer sinh ra. Front EndQu trnh bin dch bt u front end, bao gm mt vi bc phn tch m ngun ngn ng bc cao. Vic bin dch thng bt u vi mt qu trnh gi l lexical analysis or scanning, compiler s duyt qua source file v scan text tm cc tocken trong n. Token l cc textual symbols to nn code, v d nh mt dng :if(Remainder!=0)

th if, (, Remainder, != u l cc token. Trong khi scanning for tokens, the lexical analyzer xc nhn rng cc tokens ny to ra mt cu hp l theo quy tc ca ngn ng. V d, the lexical analyzer c th kim tra rng token if phi theo sau l mt token ( trong mt vi ngn ng. Km theo mi word, analyzer lu tr l ngha ca n trong mt ng cnh c th. iu ny c th xem nh mt dng rt n gin ca vic con ngi chia nh cu trong ngn ng t nhin. Mt cu c chia thnh mt vi logical parts, v cc words ch c ngha tht s khi c t trong mt ng cnh. Tng t, lexical analysis lin quan ti vic xc nhn tnh hp l ca mi token ny trong ng cnh hin ti. Nu mt token xut hin bt thng trong ng cnh hin ti, compiler s bo li.Front end chc chn l thnh phn t lin quan ti reverser nht, v n ch yu l mt bc chuyn i, him khi thay i ngha ca chng trnh theo bt k cch no, n ch n thun xc nhn rng chng trnh hp l, v convert chng trnh sang ng intermediate representation

Intermediate RepresentationsVai tr chnh ca compiler l bin i code t dng biu din ny sang dng biu din khc. Trong qu trnh ny, compiler phi sinh ra dng biu din code ca ring n. Dng intermediate representation (hoc internal representation) ny c ch trong vic pht hin bt k code errors no, ci thin da trn code, v cui cng l sinh ra machine code.La chn ng n intermediate representation of code trong mt compiler l mt trong nhng quyt nh thit k quan trong nht ca ngi thit k compiler. The layout ny ph thuc nng n vo input ca compiler l loi ngn ng bc cao no, v compiler phun ra loi object code no. Mt vi intermediate representation c th rt gn vi ngn ng bc cao v gi li rt nhiu cu trc ban u ca chng trnh. Nhng thng tin ny c th hu ch nu advanced improvements and optimizations sp c thc hin trn code. Nhng compiler khc s dng intermediate representation rt gn vi m assembly. Dng biu din ny s loi b rt nhiu cu trc mc cao c nhng trong code gc, ph hp cho nhng thit k trnh bin dch tp trung nhiu vo cc chi tit mc thp ca code. Cui cng, rt t trnh bin dch c ti 2 dng biu din trung gian, mi dng dng trong mt qu trnh bin dch.

OptimizerKh nng ti u ha ca compiler l l do chnh reverser phi hiu compilers. Optimizer thc hin rt nhiu k thut khc nhau ci tin hiu qu ca code. Hai mc tiu chnh ca optimizer l sinh ra code c hiu nng cao nht c th, hoc sinh ra binaries cng nh cng tt. Hu ht compiler c gng t c c hai mc tiu cng nhiu cng tt.Ti u ha c thc hin trong optimizer khng ph thuc vo processor c th. D thc hin ti u ha kiu no th optimizer cng phi lun lun gi nguyn chnh xc ngha ca chng trnh gc v khng thay i hnh vi ca n.Code StructureOptimizer thng xuyn sa i cu trc ca code n hiu qu hn nhng vn gi nguyn ngha ca n. V d, cc vng lp loops thng c unrolled tng phn hoc ton b. Unrolled mt vng lp ngha l thay v lp li cng mt on code dng jump instruction, on code s n gin l c nhn bn processor chy n nhiu ln. Nh th binary s ln hn, nhng trnh c vic phi qun l counter v gi conditional branches (instruction ny km hiu qu -- xem mc CPU pipelines trong chng ny). Cng c th unroll tng phn mt loop gim s iterations bng cch thc hin nhiu hn mt iteration trong mi cycle of the loop.Khi kho st lnh switch, compiler c th xc nh u l hng tip cn hiu qu nht tm kim the correct case khi thc thi. y c th l mt direct table, mi individual blocks c truy cp dng cc operand, hoc dng hng tip cn tree-based search khc.Vng lp c th c sp xp li cho hiu qu hn. Vng lp ph bin nht l pretested loop, iu kin c kim tra trc khi thn vng lp c chy. Vn ca loi vng lp ny l n cn thm mt lnh unconditional jump cui thn vng lp nhy ngc li ch bt u vng lp (cn posttested loop ch c mt single conditional branch instruction cui vng lp, nn hiu qu hn). V th compiler thng convert mt pretested loop thnh mt posttested loop. Trong mt s trng hp vic chuyn i ny yu cu chn thm mt lnh if trc khi bt u vng lp khng i vo thn vng lp nu iu kin ca vng lp khng c tha mn.Ti u cu trc code c tho lun chi tit trong Appendix ARedundancy Elimination : Loi b nhng phn d thaLTV thng xuyn to ra code tha, nh lp li cng mt lnh tnh ton ln, gn gi tr cho 1 bin m khng dng n n, vv. Optimizer c cc gii thut tm kim nhng on tha nh vy v b i.V d, LTV thng static expression bn trong vng lp, nh th lng ph v khng cn phi tnh i tnh li chng. Optimizer c th nhn ra nhng lnh nh vy v t n ra ngoi vng lp ci thin hiu qu code.Optimizer c th b tr li cc tnh ton s hc trn con tr tnh ton hiu qu a ch ca mt item trong mt array hay data Structure, v cache li kt qu vic tnh ton khng lp li nu cn truy cp li vo item ny sau .

Back EndBack_end ca compiler cn c gi l code generator, chu trch nhim sinh ra target specific code t intermediate code c sinh ra v x l trong cc pha trc ca qu trnh bin dch. y l ni intermediate representation meets target specific language, thng l mt loi assembly language mc thp.V code generator chu trch nhim cho vic sinh ra assembly language instruction c th, n l thnh phn duy nht c thng tin thc thin qu trnh ti u c th cho ring tng nn tng (platform-specific optimization). Pha ny lm cho assembly language code m trnh bin dch sinh ra tr nn kh hiu.Di y l 3 pha quan trng nht trong qu trnh sinh code: Instruction selection : y l ni code t intermediate representation c dch sang platform-specific instruction. Vic la chn tng instruction rt quan trng i vi hiu nng ca ton chng trnh, v yu cu compiler nhn thc c cc thuc tnh khc nhau ca mi instruction. Register allocation : Trong nhiu intermediate representation c v s thanh ghi, v th mi bin cc b c th t trong mt register. Nhng target processor ch c vi register, nn trnh bin dch phi quyt nh t bin no trong register, t bin no trong stack. Instruction scheduling: Hu ht processor hin i u c th x l nhiu instruction mt ln, vic ph thuc d liu gia cc instruction tr thnh vn . Ngha l nu mt instruction thc hin mt thao tc v lu kt qu trong mt register, th vic c t register trong cc instruction ngay sau s phi delay, v kt qu ca thao tc trc c th cha available. V th code generator trin khai mt platform-specific instruction scheduling algorithms sp xp li th t cc instruction t mc song song cao nht. Kt qu cui cng l "interleaved code", tc l hai chui instruction thc hin hai vic ring r c an ci vo nhau to ra mt chui instruction. Ta s gp nhiu chui nh vy trong nhiu reversing sessions trong sch ny.Listing FilesMt listing file l mt file text c trnh bin dch sinh ra, cha assembly language code do compiler sinh ra. Thng tin ny c th to ra bng cch diassembling the binaries, nhng sinh ra listing file nh vy thun tin hn khi cn ch ra mi dng assembly ng vi on no trong m ngun ban u. Listing files khng phi l mt reversing tool ng ngha, nhng n l mt cng c nghin cu c dng nghin cu hnh vi ca mt trnh bin dch c th bng cch cho n compile cc on code khc nhau ri quan st output thng qua listing file.Hu ht cc compiler h tr vic sinh ra listing files trong qu trnh bin dch. Vi mt vi trnh bin dch nh GCC, y l mt standard part ca qu trnh bin dch, bi v compiler khng trc tip sinh ra object file, m thay vo sinh ra mt assembly language file, sau dng assembler x l tip. Ta c th xem c file ny ch cn compiler khng xa n sau khi assembler xong vic. Trong nhng compiler khc nh Microsoft hay Intel, listing file l mt ty chn v phi c enable thng qua command line.

Specific compilerCc mu code trong sch ny c sinh ra bng mt trong 3 compiler sau: GCC and G++ version 3.3.1 : C mt optimization engine mnh m tng t hai compiler khc, nhng IA-32 code generator khng bng, do n phi sinh code cho qu nhiu processor khc nhau. Code IA-32 n sinh ra km hiu qu hn cc compiler ph bin cho IA-32 khc. T gc nhn ca reverser th y li l mt thun li v code n sinh ra thng d c hn mt cht so vi Microsoft v Intel. Microsoft C/C++ Optimizing Compiler version 13.10.3077 : y l compiler ph bin nht cho nn tng windows, c ship vi Visual Studio, phin bn dng trong sch ny c ship vi Visual C++ .NET 2003 Intel C++ Compiler version 8.0 : c pht trin ch yu ti a hiu nng ca Intel's IA-32 processor. Optimizer ca n ngang vi 2 compiler kia nhng phn back end mi l ni n ta sng. Intel tp trung vo vic sinh code IA-32 c ti u ha cao tn dng kin trc c th ca nn tng Intel.Execution EnvironmentsExecution Environment l thnh phn tht s chy chng trnh. y c th l mt CPU hoc mt software environment nh mt my o. Execution environment c bit quan trng vi reverser bi v kin trc ca n thng nh hng ti cch chng trnh c sinh ra v bin dch, ti kh nng c ca code v qu trnh reversing.C hai kiu execution environment c bn l virtual machine v microprocessor :Software Execution Environment (Virtual Machine)Mt vi nn tng pht trin phn mm khng to ra executable machine code c th chy trc tip trn processor. Thay vo chng sinh ra mt loi intermediate representation ca chng trnh, or byte code. Bytecode ny sau c mt chng trnh c bit trn my ngi dng c, v thc thi trn local processor. Chng trnh ny gi l my o. Virtual machine lun processor-specific, ngha l mt virtual machine c th ch c th chy trn mt platform c th. Tuy nhin bytecode format thng c nhiu my o cho php n chy cng mt chng trnh bytecode trn cc platform khc nhau.Hai kin trc my o ph bin l JVM (Java Virtual Machine) v CLR (Common language Runtime).Chng trnh chy trn my o c nhiu thun li ln so vi native program c chy trc tip trn underlying hardware: Platform isolation: V chng trnh n tay end user di dng generic representation, not machine-specific, nn v mt l thuyt n c th chy trn bt k my tnh no c mt execution environment ph hp. Nh sn xut phn mm khng phi lo v vn tng thch nn tng (t nht v mt l thuyt) -- execution environment ng gia chng trnh v h thng v encapsulates mi kha cnh platform-specific Enhanced functionality : Khi mt chng trnh chy trong mt my o, n thng c hng rt nhiu tnh nng nng cao m him khi tm thy trong cc silicon processor, nh garbage collection, l mt h thng t ng theo di vic s dng ti nguyn v t ng release memory objects khi n khng cn c dng n na. Mt tnh nng khc l runtime type safety : v my o c thng tin chnh xc v kiu d liu trong chng trnh ang chy, n c th xc nhn rng type safety c duy tr trong sut chng trnh. Mt vi my o cng theo di vic truy cp b nh m bo hp l. Bi v my o bit chnh xc di ca mi memory block, v c th theo di vic s dng chng trong ton chng trnh, n c th d dng pht hin nhng trng hp chng trnh c gng c/ghi ra ngoi memory block, vv

BytecodeMi my o thng c nh dng bytecode ca ring n. V c bn y l mt ngn ng mc thp nh assembly language, nh IA-32 assembly language. Khc ch binary code ny c chy th no. Khng ging nh binary program thng thng c mi instruction c decode v execute bi hardware, my o t thc hin decoding of the program binaries. iu ny cho php kim sot cht ch mi th chng trnh lm, v mi instruction c chy phi thng qua my o, my o c th theo di v kim sot bt k thao tc no chng trnh thc hin.InterpretersHng tip cn ban u ci t my o l dng Interpreters. Interpreters l cc chng trnh c bytecode, gii m mi instruction v "execute" n trong mt mi trng o c ci t bng phn mm. Quan trng cn hiu rng khng nhng cc instruction ny khng c chy trc tip trn host processor, m vic truy cp d liu ca chng trnh bytecdoe cng b qun l bi Interpreters. Chng trnh bytecode khng th truy cp trc tip ti register ca CPU. Bt k "register" accessed by the bytecode s phi c mapped to memory by the Interpreters.Interpreters c mt im yu chnh : Hiu nng. V mi instruction c decode v chy ring r bi mt chng trnh chy trn CPU thc, chng trnh s chy chm hn rt nhiu so vi chy trc tip trn CPU. L do ti sao s tr nn r rng khi ta xem khi lng cng vic m Interpreters phi thc hin chy mt single high-level bytecode instruction.Vi mi instruction, Interpreters phi jump to a special function or code area that deals with it, xc nh ton hng lin quan, v sa i trng thi ca h thng phn nh s thay i. Ngay c implementaion tt nht ca mt Interpreters vn dch mi lnh bytecode thnh hng t instruction chy trn CPU. Cc chng trnh c thng dch s chy chm hn nhiu phin bn bin dch ca n.Just-In-Time Compilers.Ci t ca cc my o hin ti trnh dng interpresters v vn hiu nng, m dng JIT -- Just In Time Compiler.y l mt hng tip cn khc chy chng trnh bytecode m khng c performance penalty nh vi interpresters. tng l ly mt snippets ca chng trnh bytecode thi im thc thi v compile chng thnh native processor's machine language trc khi chy chng. Nhng snippets ny sau c chy natively trn host's CPU. y thng l mt ongoing process, mt on m bytecode c bin dch theo nhu cu, mi khi cn n.Reversing StrategiesReversing chng trnh bytecode thng hon ton khc chng trnh native thng thng. Trc ht, bytecode thng chi tit hn nhiu so vi native machine code tng ng. V d Microsoft .NET cha thng tin chi tit v kiu d liu gi l metadata. metadata cung cp thng tin v class, function parameters, kiu bin cc b, nhiu na...Vic c c nhng thng tin ny hon ton thay i vic reversing v n mang ta ti gn biu din bc cao ban u ca chng trnh. Thc t nhng thng tin ny cho php to ra cc decompiler ti to hiu qu biu din bc cao d c hn nhiu. Tnh hung ny l ng cho c Java v .NET, gy kh khn cho cc nh sn xut phn mm trong vic bo v chng trnh ca h khi b reversing. Gii php trong hu ht cc trng hp l dng obfuscators -- chng trnh gng loi b cc thng tin nhy cm khi executable cng nhiu cng tt.Reverser c hai la chn : hoc dng mt decompiler xy dng li high-level representation ca chng trnh, hoc hc native low-level language ca chng trnh, c code v c gng xc nh thit k, mc ch ca chng trnh.

Hardware Execution Environments in Modern ProcessorsSch ny tp trung vo reversing for native IA-32 program, nn cn xem qua cch code c chy trn cc processor ny nh th no.Trc y, processor's runtime ch n gin bao gm mt chui lp i lp li v hn vic c mt instruction t b nh, decode n, v triger ng circuit thc hin thao tc c ch nh trong machine code. Vic thc thi hon ton serial (tun t).Ngy nay processor buc phi h tr tnh ton song song, chin lc chung l c gng chy nhiu instruction cng lc. vn pht sinh khi instruction ny ph thuc vo thng tin m instruction khc to ra, khi cc instruction ny phi chy theo ng th t ban u gi nguyn chc nng ca code.V gii hn ny, cc compiler hin i trin khai rt nhiu k thut sinh code c th chy hiu qu nht trn cc processor hin i. iu ny nh hng mnh ln tnh d c ca disassembled code trong khi reversing. Hiu l do bn di nhng k thut ti u nh vy cho php ta gii m c nhng on code c ti u ha nh vy.Sau y tho lun kin trc chung ca cc modern IA-32 processor v cch t c s song song v instruction throughput cao.Intel NetBurstIntel NetBurst microarchitecture l execution environment hin ti ca nhiu processor IA-32 hin i. Kin trc ny gii thch l do ng sau cc hng dn ti u c dng trong hu nh mi IA-32 code generator.Micro-Ops opsIA-32 processor dng microcode ci t mi instruction. Microcode l mt tng khc trong lp trnh nm bn trong processor. Bn thn processor cha code s khai hn nhiu, ch c kh nng thc thi nhng tc v ht sc n gin ( mt tc cc k cao). implement cc instruction IA-32 tng i phc tp, processor c mt microcode ROM, cha chui microcode cho mi instruction trong instruction set.Qu trnh lin tc tm kim instruction microcode trong ROM c th to ra tht c chai hiu nng, v th IA-32 processor trin khai mt execution trace cache chu trch nhim caching the microcodes ca instruction c chy thng xuyn nht.PipelinesV c bn, CPU pipelines ging nh mt factory assembly line for decode and execute program instruction. Mt instruction i vo pipelines v b chia thnh vi task mc thp c th c take care of by the processor. Trong NetBurst processor, pipelines dng 3 pha chnh:1. Front end: chu trch nhim decode tng instruction v to ra mt chui micro-ops biu din cho instruction . micro-ops sau c y vo Out of Order Core.2. Out of Order Core: thnh phn ny nhn chui micro-ops t front end v sp xp li chng da trn tnh sn sng ca cc ti nguyn khc nhau ca processor. tng l tn dng ti a ti nguyn t c tnh song song. iu ny ph thuc ln vo original code c a vo front end. Core ny s pht ra nhiu micro-ops mi clock cycle.3. Retirement section: m bo rng th t ca cc instruction ban u trong chng trnh l khng i khi p dng kt qu ca the out-of-order execution.[FIXME]

Branch PredictionProcessor c deep pipelines phi lun bit instruction no s c chy tip theo. Thng th processor lun fetches instruction tip theo vo trong b nh nu cn ch, nhng nu c mt lnh r nhnh c iu kin trong code th sao ?Conditional branches l mt vn , bi v ta cn fetches instruction tip theo trong khi kt qu ca n vn cha bit. Mt la chn l i cho ti khi bit kt qu c r nhnh hay khng. iu ny nh hng n hiu nng v processor ch thc hin full capacity khi pipelines is full. Refilling the pipelines cn mt s ln clock cycles, ph thuc vo di ca pipelines v cc yu t khc.Gii php cho vn ny l th d on trc kt qu ca mi conditional branch. Da trn s d on ny, processor s in y pipelines bng cc instruction ngay sau branch instruction (khi d on khng r nhnh) hoc lnh branch's target address (khi on nhnh c r). Mi s d on sai u t, v yu cu ton b pipelines phi c lm rng.Chin lc d on l backward branch, nhnh nhy ti mt instruction trc thng c thc hin nhiu hn, v thng thng nhng nhnh ny c dng trong vng lp, mi ln lp l mt ln jump, v ch c ln jump cui cng nht l khng nhy ngc li. Forward branch (thng dng trong lnh if) c gi s l khng xy ra. ci thin kh nng d on ca processor, IA-32 processor trin khai mt branch trace buffer (BTB) ghi li kt qu ca branch instruction gn y nht c x l. Khi gp mt branch, n tm trong BTB, nu thy mt entry, processor dng thng tin d on r nhnh.