42
THEORY OF COMPILATION Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

Embed Size (px)

Citation preview

Page 1: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

THEORY OF COMPILATIONLecture 10 – Code Generation

Eran Yahav

Reference: Dragon 8. MCD 4.2.4

Page 2: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

2

You are here

Executable

code

exe

Source

text

txt

Compiler

LexicalAnalysi

s

Syntax Analysi

s

Parsing

Semantic

Analysis

Inter.Rep.

(IR)

Code

Gen.

Page 3: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

3

Last Week: Runtime Part II Nested procedures Object layout Inheritance Multiple inheritance

Page 4: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

4

Today

Runtime checks Garbage collection Generating assembly code

Page 5: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

5

Runtime checks

generate code for checking attempted illegal operations Null pointer check

MoveField, MoveArray, ArrayLength, VirtualCall Reference arguments to library functions should not be

null Array bounds check Array allocation size check Division by zero …

If check fails jump to error handler code that prints a message and gracefully exists program

Page 6: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

6

Null pointer check

# null pointer check

cmp $0,%eax

je labelNPE

labelNPE: push $strNPE # error message call __println push $1 # error code call __exit

Single generated handler for entire program

Page 7: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

7

Array bounds check

# array bounds check mov -4(%eax),%ebx # ebx = length mov $0,%ecx # ecx = index cmp %ecx,%ebx jle labelABE # ebx <= ecx ? cmp $0,%ecx jl labelABE # ecx < 0 ?

labelABE: push $strABE # error message call __println push $1 # error code call __exit

Single generated handler for entire program

Page 8: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

8

Array allocation size check

# array size check

cmp $0,%eax # eax == array size

jle labelASE # eax <= 0 ?

labelASE: push $strASE # error message call __println push $1 # error code call __exit

Single generated handler for entire program

Page 9: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

9

Automatic Memory Management automatically free memory when it is no longer needed not limited to OO programs, we show it here because it

is prevalent in OO languages such as Java also in functional languages

approximate reasoning about object liveness use reachability to approximate liveness assume reachable objects are live

non-reachable objects are dead

Three classical garbage collection techniques reference counting mark and sweep copying

Page 10: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

10

GC using Reference Counting add a reference-count field to every

object how many references point to it

when (rc==0) the object is non reachable non reachable => dead can be collected (deallocated)

Page 11: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

11

Managing Reference Counts

Each object has a reference count o.RC A newly allocated object o gets o.RC = 1

why?

write-barrier for reference updatesupdate(x,old,new) { old.RC--; new.RC++; if (old.RC == 0) collect(old); }

collect(old) will decrement RC for all children and recursively collect objects whose RC reached 0.

Page 12: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

12

Cycles!

cannot identify non-reachable cycles reference counts for nodes on the cycle

will never decrement to 0 several approaches for dealing with

cycles ignore periodically invoke a tracing algorithm to

collect cycles specialized algorithms for collecting

cycles

Page 13: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

13

GC Using Mark & Sweep

Marking phase mark roots trace all objects transitively reachable

from roots mark every traversed object

Sweep phase scan all objects in the heap collect all unmarked objects

Page 14: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

14

mark_sweep() { for Ptr in Roots mark(Ptr) sweep()}

mark(Obj) { if mark_bit(Obj) == unmarked { mark_bit(Obj)=marked for C in Children(Obj) mark(C) }}

Sweep() { p = Heap_bottom while (p < Heap_top) if (mark_bit(p) == unmarked) then free(p) else mark_bit(p) = unmarked; p=p+size(p)}

GC Using Mark & Sweep

Page 15: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

15

Copying GC

partition the heap into two parts: old space, new space

GC copy all reachable objects from old

space to new space swap roles of old/new space

Page 16: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

16

Example

old new

Roots

A

D

C

B

E

Page 17: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

17

Example

old new

Roots

A

D

C

B

E

A

C

Page 18: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

18

Summary

How objects are organized in memory

Automatic management of memory

Coming up… Generating assembly code

Page 19: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

19

target languages

Absolute machine code

Code

Gen.Relative

machine code

Assembly

IR + Symbol Table

Page 20: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

20

From IR to ASM: Challenges mapping IR to ASM operations

what instruction(s) should be used to implement an IR operation?

how do we translate code sequences call/return of routines

managing activation records memory allocation register allocation optimizations

Page 21: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

21

Intel IA-32 Assembly

Going from Assembly to Binary… Assembling Linking

AT&T syntax vs. Intel syntax We will use AT&T syntax

matches GNU assembler (GAS)

Page 22: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

23

IA-32 Registers

Eight 32-bit general-purpose registers EAX – accumulator for operands and result data.

Used to return value from function calls. EBX – pointer to data. Often use as array-base address ECX – counter for string and loop operations EDX – I/O pointer (GP for us) ESI – GP and source pointer for string operations EDI – GP and destination pointer for string operations EBP – stack frame (base) pointer ESP – stack pointer

EFLAGS register EIP (instruction pointer) register Six 16-bit segment registers … (ignore the rest for our purposes)

Page 23: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

24

Not all registers are born equal

EAX Required operand of MUL,IMUL,DIV and IDIV instructions Contains the result of these operations

EDX Stores remainder of a DIV or IDIV instruction

(EAX stores quotient) ESI, EDI

ESI – required source pointer for string instructions EDI – required destination pointer for string instructions

Destination Registers of Arithmetic operations EAX, EBX, ECX, EDX

EBP – stack frame (base) pointer ESP – stack pointer

Page 24: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

25

IA-32 Addressing Modes

Machine-instructions take zero or more operands

Source operand Immediate Register Memory location (I/O port)

Destination operand Register Memory location (I/O port)

Page 25: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

26

Immediate and Register Operands

Immediate Value specified in the instruction itself GAS syntax – immediate values

preceded by $ add $4, %esp

Register Register name is used GAS syntax – register names preceded

with % mov %esp,%ebp

Page 26: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

27

Memory and Base Displacement Operands

Memory operands Value at given address GAS syntax - parentheses mov (%eax), %eax

Base displacement Value at computed address Address computed out of

base register, index register, scale factor, displacement

offset = base + (index*scale) + displacement Syntax: disp(base,index,scale) movl   $42, $2(%eax) movl $42, $1(%eax,%ecx,4)

Page 27: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

28

Base Displacement Addressing

Mov (%ecx,%ebx,4), %eax

7

Array Base Reference

4 4

0 2 4 5 6 7 1

4 4 4 4 4 4

%ecx = base%ebx = 3

offset = base + (index*scale) + displacement

offset = base + (3*4) + 0 = base + 12

(%ecx,%ebx,4)

Page 28: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

29

How do we generate the code? break the IR into basic blocks basic block is a sequence of instructions

with single entry (to first instruction), no jumps to

the middle of the block single exit (last instruction) code execute as a sequence from first

instruction to last instruction without any jumps edge from one basic block B1 to another

block B2 when the last statement of B1 may jump to B2

Page 29: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

30

Example

False

B1

B2 B3

B4

True

t1 := 4 * it2 := a [ t1 ]if t2 <= 20 goto B3

t5 := t2 * t4

t6 := prod + t5

prod := t6

goto B4

t7 := i + 1i := t2

Goto B5

t3 := 4 * it4 := b [ t3 ]goto B4

Page 30: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

31

creating basic blocks

Input: A sequence of three-address statements Output: A list of basic blocks with each three-

address statement in exactly one block Method

Determine the set of leaders (first statement of a block) The first statement is a leader Any statement that is the target of a conditional or

unconditional jump is a leader Any statement that immediately follows a goto or

conditional jump statement is a leader For each leader, its basic block consists of the leader

and all statements up to but not including the next leader or the end of the program

Page 31: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

32

control flow graph

A directed graph G=(V,E)

nodes V = basic blocks

edges E = control flow (B1,B2) E when

control from B1 flows to B2

B1

B2

t1 := 4 * it2 := a [ t1 ]t3 := 4 * it4 := b [ t3 ]t5 := t2 * t4

t6 := prod + t5

prod := t6

t7 := i + 1i := t7

if i <= 20 goto B2

prod := 0i := 1

Page 32: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

example

1) i = 12) j =13) t1 = 10*I4) t2 = t1 + j5) t3 = 8*t26) t4 = t3-887) a[t4] = 0.08) j = j + 19) if j <= 10 goto (3)10) i=i+111) if i <= 10 goto (2)12) i=113) t5=i-114) t6=88*t515) a[t6]=1.016) i=i+117) if I <=10 goto (13)

33

i = 1

j = 1

t1 = 10*It2 = t1 + jt3 = 8*t2t4 = t3-88a[t4] = 0.0j = j + 1if j <= 10 goto B3i=i+1if i <= 10 goto B2

i = 1

t5=i-1t6=88*t5a[t6]=1.0i=i+1if I <=10 goto B6

B1

B2

B3

B4

B5

B6

for i from 1 to 10 do for j from 1 to 10 do a[i, j] = 0.0;for i from 1 to 10 do a[i, i] = 1.0;

source IR

CFG

Page 33: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

34

Variable Liveness

A statement x = y + z defines x uses y and z

A variable x is live at a program point if its value is used at a later point

y = 42z = 73

x = y + zprint(x);

x is live, y dead, z dead

x undef, y live, z live

x undef, y live, z undef

x is dead, y dead, z dead

(showing state after the statement)

Page 34: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

35

Computing Liveness Information between basic blocks – dataflow

analysis (next lecture)

within a single basic block? idea

use symbol table to record next-use information

scan basic block backwards update next-use for each variable

Page 35: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

36

Computing Liveness Information INPUT: A basic block B of three-address statements.

symbol table initially shows all non-temporary variables in B as being live on exit.

OUTPUT: At each statement i: x = y + z in B, liveness and next-use information of x, y, and z at i.

Start at the last statement in B and scan backwards At each statement i: x = y + z in B, we do the following:1. Attach to i the information currently found in the symbol

table regarding the next use and liveness of x, y, and z.2. In the symbol table, set x to "not live" and "no next use.“3. In the symbol table, set y and z to "live" and the next uses

of y and z to i

Page 36: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

37

Computing Liveness Information Start at the last statement in B and scan backwards

At each statement i: x = y + z in B, we do the following:1. Attach to i the information currently found in the symbol

table regarding the next use and liveness of x, y, and z.2. In the symbol table, set x to "not live" and "no next use.“3. In the symbol table, set y and z to "live" and the next

uses of y and z to i

can we change the order between 2 and 3?

x = 1 y = x + 3 z = x * 3 x = x * z

Page 37: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

38

common-subexpression elimination

common-subexpression elimination

a = b + cb = a – dc = b + cd = a - d

a = b + cb = a – dc = b + cd = b

Page 38: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

39

DAG Representation of Basic Blocks

a = b + cb = a - d

c = b + cd = a - d

b0 c0

+ d0

-

+

a

b,d

c

Page 39: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

40

DAG Representation of Basic Blocks

a = b + cb = b - dc = c + de = b + c

b0 c0

+

d0

- +a b c

+ e

Page 40: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

41

algebraic identities

a = x^2b = x*2c = x/2d = 1*x

a = x*xb = x+xc = x*0.5d = x

Page 41: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

42

coming up next

register allocation

Page 42: Lecture 10 – Code Generation Eran Yahav 1 Reference: Dragon 8. MCD 4.2.4

43

The End