74
PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 1 Part 2: Diablo Data Structures Goals Linker data structures Internal representation Construction of graphs Concrete Data Structures Manipulation Dynamic Members

Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 1

Part 2:Diablo Data Structures

● Goals● Linker data structures● Internal representation● Construction of graphs● Concrete Data Structures● Manipulation● Dynamic Members

Page 2: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2

Data Structures: Goal

Retargetable

Reliable

Extensible

Easy to manipulate

➔ abstract architecture specific details

✔ CFG✗ SSA

►model control flow conservatively◄precise = not too conservative

➔ easy to augment basic data structures with extra data

Page 3: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 3

Transition in small steps

Link Graph

Direct Control Flow andRelocatable Address Graph

Augmented Whole ProgramControl Flow Graph

InterproceduralControl Flow Graph

Input Data Structures

Coarse-grained graph,enables linking and unused section removal

Fine-grained graph, enables unreachable code and data removal

= DCFRAG + special edgesEnables (flow)analysis of the program

Makes analysis of the program more easy

Page 4: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 4

Part 2:Diablo Data Structures

● Goals● Linker data structures● Internal representation● Construction of graphs● Concrete Data Structures● Manipulation● Dynamic Members

Page 5: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 5

Linker Data Structures: our input

● Most are simply an abstract representation of data used by a linker:– archives (containers of relocatable objects)– relocatable objects (container of sections)– section (containers of data)

● Interesting structures for Diablo:– relocs (relocation information)– symbols

Page 6: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 6

● Represent every entity that can need relocating – have an address and a size– can be used in symbols and relocs– can be changed by relocs

● e.g. sections are relocatable objects

Relocatable objects

Page 7: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 7

● labeled address expression– use the value of relocatable objects to calculate an

address– example:

● symbol “offset_between_section_a_and_b_plus_10” ● code = “R01 R00 – A00 +$”

● used during symbol resolution– order

● > means it overwrites other symbols– can create data (e.g. bss)

Symbols

section a section b 10offset in a offset in b

Page 8: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 8

Relocations

● use the address of relocatable objects and symbols● to compute an address ● put it in the desired encoding● write it somewhere in a relocatable object● checks if relocation was successful (no overflow)

Page 9: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 9

CHECK(not used)COMPUTE ENCODE

Relocations

● Example● for an instruction “load immediate pc relative” that

– loads the address of (an offset in) a relocatable object– minus the address of a symbol – plus some value (addend) – and stores this address pc relative (instruction encoding)– but automatically increases with the pc when executed

– “R00 S00 – A00 +” “\\” “P- l*w” “\\” “s0000” “$”

Page 10: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 10

● Example● for an instruction “load immediate pc relative” that

– loads the address of (an offset in) a relocatable object– minus the address of a symbol – plus some value (addend) – and stores this address pc relative (instruction encoding)– but automatically increases with the pc when executed

– “R00 S00 – A00 +” “\\” “P- l*w” “\\” “s0000” “$”

Relocations

Page 11: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 11

● Example● for an instruction “load immediate pc relative” that

– loads the address of (an offset in) a relocatable object– minus the address of a symbol – plus some value (addend) – and stores this address pc relative (instruction encoding)– but automatically increases with the pc when executed

– “R00 S00 – A00 +” “\\” “P- l*w” “\\” “s0000” “$”

Relocations

Page 12: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 12

● Example● for an instruction “load immediate pc relative” that

– loads the address of (an offset in) a relocatable object– minus the address of a symbol – plus some value (addend) – and stores this address pc relative (instruction encoding)– but automatically increases with the pc when executed

– “R00 S00 – A00 +” “\\” “P- l*w” “\\” “s0000” “$”

Relocations

Page 13: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 13

● Example● for an instruction “load immediate pc relative” that

– loads the address of (an offset in) a relocatable object– minus the address of a symbol – plus some value (addend) – and stores this address pc relative (instruction encoding)– but automatically increases with the pc when executed

– “R00 S00 – A00 +” “\\” “P- l*w” “\\” “s0000” “$”

Relocations

Page 14: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 14

● Example● for an instruction “load immediate pc relative” that

– loads the address of (an offset in) a relocatable object– minus the address of a symbol – plus some value (addend) – and stores this address pc relative (instruction encoding)– but automatically increases with the pc when executed

– “R00 S00 – A00 +” “\\” “P- l*w” “\\” “s0000” “$”

Relocations

Page 15: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 15

● Example● for an instruction “load immediate pc relative” that

– loads the address of (an offset in) a relocatable object– minus the address of a symbol – plus some value (addend) – and stores this address pc relative (instruction encoding)– but automatically increases with the pc when executed

– “R00 S00 – A00 +” “\\” “P- l*w” “\\” “s0000” “$”

Relocations

MARKERS

Page 16: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 16

TO FROM

Relocations

● Example● for an instruction “load immediate pc relative” that

– loads the address of (an offset in) a relocatable object– minus the address of a symbol – plus some value (addend) – and stores this address pc relative (instruction encoding)– but automatically increases with the pc when executed

– “R00 S00 – A00 +” “\\” “P- l*w” “\\” “s0000” “$”

section a symbol 42offset in a section c offset in c

section b offset in b

Page 17: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 17

bad_pointer: code = S00A00+...

Relocation subtleties

#include <stdio.h>

void v1() {

printf("v1\n");}void v2() {

printf("v2\n");}void (*pointer)() = v1;int *bad_pointer = ((int *) v2) + 1;int main(int argc, char ** argv) {

pointer();((void (*)()) (bad_pointer -1))();return 0;

}

v1 code = R00

.text 0

pointer: code = S00 A00+...

0

1

v2 code = R00

.text 20

Page 18: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 18

Relocation subtleties

.text code = R00

.text 0

bad_pointer: code = S00A00+...

pointer: code = S00 A00+...

0

21

#include <stdio.h>

void v1() {

printf("v1\n");}void v2() {

printf("v2\n");}void (*pointer)() = v1;int *bad_pointer = ((int *) v2) + 1;int main(int argc, char ** argv) {

pointer();((void (*)()) (bad_pointer -1))();return 0;

}

Page 19: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 19

Relocation subtleties

.text code = R00

.text 0x0

bad_pointer: code = S00A00+...

pointer: code = S00 A00+...

0

21

#include <stdio.h>

void v1() {

printf("v1\n");}void v2() {

printf("v2\n");}void (*pointer)() = v1;int *bad_pointer = ((int *) v2) + 1;int main(int argc, char ** argv) {

pointer();((void (*)()) (bad_pointer -1))();return 0;

}RELOCATIONS & SYMBOLSSHOULD NOT BE RELAXED

Page 20: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 20

Part 2:Diablo Data Structures

● Goals● Linker data structures● Internal representation● Construction of graphs● Concrete Data Structures● Manipulation● Dynamic Members

Page 21: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 21

Object file b

Object files

Relocation:from code

to sym aptrS00\l*w\s000$

Relocation:from datato sym a

S00\l*w\s000$

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Symbol aptr:order 10to data

R00$

Code

Symbol a:order 0

to undefined R00$

Page 22: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 22

Object file b

Graph representation

Relocation:from code

to sym aptrS00\l*w\s000$

Relocation:from datato sym a

S00\l*w\s000$

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Symbol aptr:order 10to data

R00$

CodeSymbol a:

order 0to undefined

R00$

Page 23: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 23

Part 2:Diablo Data Structures

● Goals● Linker data structures● Internal representation● Construction of graphs● Concrete Data Structures● Manipulation● Dynamic Members

Page 24: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 24

EXE

Reading information

Entry

MAP

Page 25: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 25

EXE

Reading information

Entry

MAP

.text 0x8048120 0x5e4d40 .text 0x8048120 0x24 /usr/lib/crt1.o 0x8048120 _start .text 0x8048144 0x22 /usr/lib/crti.o *fill* 0x8048166 .text 0x8048170 0xc4 /usr/lib/crtbeginT.o *fill* 0x8048234 .text 0x8048240 0x56f app_procs.o 0x8048300 app_run 0x80482a0 app_abort 0x80482e0 app_exit 0x8048240 app_libs_init *fill* 0x80487af .text 0x80487b0 0xf33 main.o 0x80487b0 main

Page 26: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 26

Object file b

EXE

Object file a

Reading information

Relocation:from code

to sym aptrS00\l*w\s000$

Relocation:from datato sym a

S00\l*w\s000$

Relocation:from code

to sym arrayS00\l*w\s000$

Entry

Data

Symbol a:order 10to code

R00$

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Symbol aptr:order 10to data

R00$

Symbol array:order 10to data

R00$

Code

Code

Symbol a:order 0

to undefined R00$

MAP

Page 27: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 27

Object file b

EXE

Object file a

Reading information

Relocation:from code

to sym aptrS00\l*w\s000$

Relocation:from datato sym a

S00\l*w\s000$

Relocation:from code

to sym arrayS00\l*w\s000$

Entry

Data

Symbol a:order 10to code

R00$

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Symbol aptr:order 10to data

R00$

Symbol array:order 10to data

R00$

Code

Code

Symbol a:order 0

to undefined R00$

MAP

Page 28: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 28

Reading information

Relocation:from code

to sym aptrS00\l*w\s000$

Relocation:from datato sym a

S00\l*w\s000$

Relocation:from code

to sym arrayS00\l*w\s000$

Entry

Data

Symbol a:order 10to code

R00$

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Symbol aptr:order 10to data

R00$

Symbol array:order 10to data

R00$

Code

Code

Symbol a:order 0

to undefined R00$

Page 29: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 29

Symbol Resolution

Relocation:from code

to sym aptrS00\l*w\s000$

Relocation:from datato sym a

S00\l*w\s000$

Relocation:from code

to sym arrayS00\l*w\s000$

Entry

Data

Symbol a:order 10to code

R00$

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Symbol aptr:order 10to data

R00$

Symbol array:order 10to data

R00$

Code

Code

Symbol a:order 0

to undefined R00$

Page 30: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 30

Symbol Resolution

Relocation:from code

to sym aptrS00\l*w\s000$

Relocation:from datato sym a

S00\l*w\s000$

Relocation:from code

to sym arrayS00\l*w\s000$

Entry

Data

Symbol a:order 10to code

R00$

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Symbol aptr:order 10to data

R00$

Symbol array:order 10to data

R00$

Code

Code

Page 31: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 31

Linkgraph

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Relocation:from code

to dataR00\l*w\s000$

Entry

Data

Symbol a:order 10to code

R00$

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Symbol aptr:order 10to data

R00$

Symbol array:order 10to data

R00$

Code

Code

Page 32: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 32

Code

Code

Symbol a:order 10to code

R00$

Relocation:from code

to dataR00\l*w\s000$

Linkgraph

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Entry

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Data

Page 33: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 33

Uses of the linkgraph

✔ Linking (placing sections, relocating)✔ Apply linker optimizations (remove unused

sections)

✗ fine-grained transformations

➔ Need for a more fine-grained graph: DCFRAG Direct Control Flow and Relocatable Addresses

Page 34: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 34

Code

Code

Symbol a:order 10to code

R00$

Relocation:from code

to dataR00\l*w\s000$

Linkgraph

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Entry

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Data

Page 35: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 35

Symbol a:order 10to code

R00$

Relocation:from code

to dataR00\l*w\s000$

Disassembler

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Entry

Bbl

Bbl

Bbl

Bbl

Bbl

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Data

Page 36: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 36

Disassembler

● Disassemble

– Architecture independent analyses and optimizations work on architecture independent part or use callbacks

Architecture independent- instruction type (jump, cmp, ...)- conditional?- register lists (used, defined)

Instruction

Architecture dependent - backend decides- opcode- regs

Page 37: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 37

Detect basic blocks

● Disassemble● Split sections into basic blocks

Linked list of instructionsType (for special block)

Basic Block

Page 38: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 38

Symbol a:order 10to code

R00$

Relocation:from code

to dataR00\l*w\s000$

Detect basic blocks

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Entry

Bbl

Bbl

Bbl

Bbl

Bbl

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Data

Page 39: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 39

Symbol a:order 10to code

R00$

Relocation:from code

to dataR00\l*w\s000$

Detect basic blocks

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Entry

Bbl

Bbl

Bbl

Bbl

Bbl

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Data

Page 40: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 40

Detect basic blocks

Symbol a:order 10to code

R00$

Relocation:from code

to dataR00\l*w\s000$

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Entry

Bbl

Bbl

Bbl

Bbl

Bbl

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Data

Page 41: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 41

Detect basic blocks

Symbol a:order 10to code

R00$

Relocation:from code

to dataR00\l*w\s000$

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Entry

Bbl

Bbl

Bbl

Bbl

Bbl

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Data

Page 42: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 42

Detect basic blocks

Symbol a:order 10to code

R00$

Relocation:from code

to dataR00\l*w\s000$

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Bbl

Bbl

BblBbl

Bbl

Bbl

Bbl

Bbl

Entry

Bbl

Bbl

Bbl

Bbl

Bbl

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Data

Page 43: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 43

Detect basic blocks

Symbol a:order 10to code

R00$

Relocation:from code

to dataR00\l*w\s000$

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Entry

Bbl

Bbl Bbl

Bbl

Bbl

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Data

Page 44: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 44

Add edges

● Disassemble● Split sections into basic blocks

– Targets direct jumps + successors conditional jumps

– To's of relocs– analyze switches (computed) to find switch targets

● Add direct control flow edges and switch edges

Connector for two basic blocksType (jump, ft, call, return, ...)

Edge

Page 45: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 45

Add edges

Symbol a:order 10to code

R00$

Relocation:from code

to dataR00\l*w\s000$

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Entry

Bbl

Bbl Bbl

Bbl

Bbl

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Data

Page 46: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 46

Symbol a:order 10to code

R00$

Relocation:from code

to dataR00\l*w\s000$

Add edges: direct jumps

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Entry

Bbl

Bbl Bbl

Bbl

Bbl

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Data

Page 47: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 47

Symbol a:order 10to code

R00$

Relocation:from code

to dataR00\l*w\s000$

Add edges: fall-through paths

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Bbl

Entry

Bbl

Bbl Bbl

Bbl

Bbl

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Data

Page 48: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 48

Symbol a:order 10to code

R00$

Relocation:from code

to dataR00\l*w\s000$

Add edges: system calls

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Bbl

Bbl

Bbl

Bbl

Bbl

Sys

Bbl

Bbl

Bbl

Entry

Bbl

Bbl Bbl

Bbl

Bbl

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Data

Page 49: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 49

Partition the code into functions

- Name- list of basic blocks- register lists (used, defined, ...)

Function

Page 50: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 50

Symbol a:order 10to code

R00$

Relocation:from code

to dataR00\l*w\s000$

Partition the code into functions

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Bbl

Bbl

Bbl

Bbl

Bbl

Sys

Bbl

Bbl

Bbl

Entry

Bbl

Bbl Bbl

Bbl

Bbl

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Data

Page 51: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 51

Function a

Partition the code into functions

Function _start

Function syscall hell

Function program_entry

Function b

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Bbl

Bbl

Bbl

Return

Bbl

Bbl

Sys

Bbl

Bbl

Bbl

Entry

Data

Relocation:from code

to dataR00\l*w\s000$

Bbl

Bbl Bbl

Return

Bbl

Bbl

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Data

Symbol a:order 10to code

R00$

Page 52: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 52

Function calleeFunction caller

Function Calls

BblBbl

Bbl

Bbl

Return

Bbl

Return

Page 53: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 53

Function calleeFunction caller

Interprocedural Goto's

BblBbl

Bbl

Return

Bbl

Return

Page 54: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 54

Function a

DCFRAG

Function _start

Function syscall hell

Function program_entry

Function b

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Bbl

Bbl

Bbl

Return

Bbl

Bbl

Sys

Bbl

Bbl

Bbl

Entry

Data

Relocation:from code

to dataR00\l*w\s000$

Bbl

Bbl Bbl

Return

Bbl

Bbl

Symbol b:order 10to code

R00$

Symbol _start:order 10to code

R00$

Data

Data

Symbol a:order 10to code

R00$

Page 55: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 55

Function a

DCFRAG

Function _start

Function syscall hell

Function program_entry

Function b

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Bbl

Bbl

Bbl

Return

Bbl

Bbl

Sys

Bbl

Bbl

Bbl

Entry

Data

Relocation:from code

to dataR00\l*w\s000$

Bbl

Bbl Bbl

Return

Bbl

Bbl

Data

Data

Page 56: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 56

Uses of the DCFRAG

✔ Fine grained removal of unused code and data

✗ Dataflow

➔ We need a graph for data flow analyses (ICFG)

Page 57: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 57

Function a

DCFRAG

Function _start

Function syscall hell

Function program_entry

Function b

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Bbl

Bbl

Bbl

Return

Bbl

Bbl

Sys

Bbl

Bbl

Bbl

Entry

Data

Relocation:from code

to dataR00\l*w\s000$

Bbl

Bbl Bbl

Return

Bbl

Bbl

Data

Data

Page 58: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 58

Relocation:from code

to dataR00\l*w\s000$

Function a

Augmented Whole-Program CFG

Function _start

Function syscall hell

Function program_entry

Function hell

Function b

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Bbl

Bbl

Bbl

Return

Hell

Bbl

Bbl

Sys

Bbl

Bbl

Bbl

Entry

Bbl

Bbl Bbl

Return

Bbl

Bbl

Return

Data

Data

Page 59: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 59

Function a

ICFG

Function _start

Function syscall hell

Function program_entry

Function hell

Function b

Bbl

Bbl

Bbl

Return

Hell

Bbl

Bbl

Sys

Bbl

Bbl

Bbl

Entry

Bbl

Bbl Bbl

Return

Bbl

Bbl

Return

Page 60: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 60

Uses of the ICFG

✔ Good for analysis and transformations

✗ Bad for writing out the program

➔ Use the combined ICFG + DCFRAG = AWPCFG

Page 61: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 61

Relocation:from code

to dataR00\l*w\s000$

Function a

AWPCFG

Function _start

Function syscall hell

Function program_entry

Function hell

Function b

Relocation:from code

to dataR00\l*w\s000$

Relocation:from datato code

R00\l*w\s000$

Bbl

Bbl

Bbl

Return

Hell

Bbl

Bbl

Sys

Bbl

Bbl

Bbl

Entry

Bbl

Bbl Bbl

Return

Bbl

Bbl

Return

Data

Data

Page 62: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 62

Part 2:Diablo Data Structures

● Goals● Linker data structures● Internal representation● Construction of graphs● Concrete Data Structures● Manipulation● Dynamic Members

Page 63: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 63

Concrete Data Structures

t_relocatable

t_symbol

t_reloc_ref

t_symbol_table

t_reloct_reloc_table

t_ins

t_bbl

t_section t_cfg

t_object

t_arm_ins

t_i386_ins

t_symbol_ref

t_cfg_edge

t_ppc_ins

t_function

Page 64: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 64

Accessing fields

● With getters and setters

● t_bbl * head = CFG_EDGE_HEAD(cfg_edge)

● ARM_INS_SET_REGA(arm_ins, reg)

● Reason: Dynamic Members

Page 65: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 65

Iterating the ICFG

● t_cfg * cfg; t_function * fun; t_bbl * bbl; t_ins * ins; t_cfg_edge * edge;

● CFG_FOREACH_FUNCTION(cfg, fun)– FUNCTION_FOREACH_BBL(fun, bbl)

● CFG_FOREACH_BBL(cfg, bbl)● BBL_FOREACH_INS(bbl, ins)● BBL_FOREACH_SUCC_EDGE(bbl, edge)● BBL_FOREACH_PRED_EDGE(bbl, edge)

Page 66: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 66

Part 2:Diablo Data Structures

● Goals● Linker data structures● Internal representation● Construction of graphs● Concrete Data Structures● Manipulation● Dynamic Members

Page 67: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 67

Primitive Transformations

● Relocations– Add: RelocTableAddRelocToRelocatable– Remove: RelocTableRemoveReloc– Modify: RelocSetFrom, RelocSetToRelocatable

● Sections– Create: SectionCreateForObject

● Functions– Create: FunctionMake– Remove: FunctionKill

Page 68: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 68

Primitive Transformations

● Basic blocks– Create: BblNew– Remove: BblKill– Duplicate: BblDup– Split: BblSplitBlock

● ICFG edges– Create: CfgEdgeCreate– Remove: CfgEdgeKill

● Instructions– Create: InsNewForBbl– Remove: InsKill

Page 69: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 69

On the DCFRAG

● SECTION_REFED_BY(sec)● SECTION_REFERS_TO(sec)

● BBL_REFED_BY(bbl)● BBL_REFED_BY_SYM(bbl)

● INS_REFERS_TO(ins)

Page 70: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 70

Consistency of the AWPCFG

● Manipulate one view, what happens on other?

● Diablo tries to keep things consistent– Kill reloc, and its ICFG-edge is also killed– Kill ins, and to-relocs are also killed– Remove a section, and all to-relocs are also killed

● Makes sure that you do the proper thing– Try to kill an object with refers_to relocs, and it fatals

Page 71: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 71

Part 2:Diablo Data Structures

● Goals● Linker data structures● Internal representation● Construction of graphs● Concrete Data Structures● Manipulation● Dynamic Members

Page 72: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 72

Dynamic Members

● Diablo needs to be extensible

● Many analyses compute different kinds of information on very large data sets

● We cannot include all of them in the main libraries, or even store all information together

● Dynamic members augment basic data structures with members that can be allocated on the fly

Page 73: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 73

Dynamic Members

● Example: member for reachability

– t_bool BBL_REACHABLE(t_bbl)– BBL_SET_REACHABLE(t_bbl, t_bool)– BblInitReachable(t_cfg *)

● Allocates space for this field and calls init callback for each bbl

– BblFiniReachable(t_cfg *)● Calls fini callback and deallocates space

Page 74: Part 2: Diablo Data Structures · PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 2 Data Structures: Goal Retargetable Reliable Extensible Easy to manipulate abstract architecture

PLDI 06 Tutorial - Binary Rewriting with Diablo - part 2 74

Dynamic Members

To instantiate the member:

DYNAMIC_MEMBER( bbl, /* data structure to extend */ t_cfg *, /* manager type */ bbl_reachable_array, /* array to hold members */ t_bool, /* the type of the member */ reachable, /* lowercase name */ REACHABLE, /* UPPERCASE name */ Reachable, /* CamelCase name*/ CFG_FOREACH_BBL, /* iterator */ BblReachableInitCb, /* init callback*/ BblReachableFiniCb, /* fini callback */ BblReachableDupCb, /* dup callback */);