91
CIS 341: COMPILERS Lecture 22

Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

  • Upload
    others

  • View
    10

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

CIS 341: COMPILERSLecture 22

Page 2: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Announcements

• HW5: Oat v. 2.0– records, function pointers, type checking, array-bounds checks, etc.– typechecker & safety– Due: TOMORROW Friday, April 17th

• HW6: Analysis & Optimizations– Alias analysis, constant propagation, dead code elimination, register

allocation– Available Soon– Due: Wednesday, April 29th

Zdancewic CIS 341: Compilers 2

Page 3: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

CODE ANALYSIS

Zdancewic CIS 341: Compilers 3

Page 4: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Motivating Code Analyses• There are lots of things that might influence the safety/applicability of

an optimization– What algorithms and data structures can help?

• How do you know what is a loop?• How do you know an expression is invariant?• How do you know if an expression has no side effects?• How do you keep track of where a variable is defined?• How do you know where a variable is used?• How do you know if two reference values may be aliases of one

another?

CIS 341: Compilers 4

Page 5: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Moving Towards Register Allocation• The OAT compiler currently generates as many temporary variables as

it needs – These are the %uids you should be very familiar with by now.

• Current compilation strategy:– Each %uid maps to a stack location.– This yields programs with many loads/stores to memory.– Very inefficient.

• Ideally, we’d like to map as many %uid’s as possible into registers.– Eliminate the use of the alloca instruction?– Only 16 max registers available on 64-bit X86– %rsp and %rbp are reserved and some have special semantics, so really

only 10 or 12 available– This means that a register must hold more than one slot

• When is this safe?

CIS 341: Compilers 5

Page 6: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Liveness• Observation: %uid1 and %uid2 can be assigned to the same register

if their values will not be needed at the same time.– What does it mean for an %uid to be “needed”? – Ans: its contents will be used as a source operand in a later instruction.

• Such a variable is called “live”• Two variables can share the same register if they are not live at the

same time.

CIS 341: Compilers 6

Page 7: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Scope vs. Liveness• We can already get some coarse liveness information from variable

scoping.• Consider the following OAT program:

int f(int x) {var a = 0; if (x > 0) {

var b = x * x;a = b + b;

}var c = a * x; return c;

}

• Note that due to OAT’s scoping rules, variables b and c can never be live at the same time.– c’s scope is disjoint from b’s scope

• So, we could assign b and c to the same alloca’ed slot and potentially to the same register.

CIS 341: Compilers 7

Page 8: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

But Scope is too Coarse • Consider this program:int f(int x) {int a = x + 2;int b = a * a;int c = b + x;return c;

}

• The scopes of a,b,c,x all overlap – they’re all in scope at the end of the block.

• But, a, b, c are never live at the same time.– So they can share the same stack slot / register

CIS 341: Compilers 8

x is live

a and x are liveb and x are live

c is live

Page 9: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Live Variable Analysis• A variable v is live at a program point if v is defined before the

program point and used after it.• Liveness is defined in terms of where variables are defined and where

variables are used

• Liveness analysis: Compute the live variables between each statement.– May be conservative (i.e. it may claim a variable is live when it isn’t) so

because that’s a safe approximation– To be useful, it should be more precise than simple scoping rules.

• Liveness analysis is one example of dataflow analysis– Other examples: Available Expressions, Reaching Definitions, Constant-

Propagation Analysis, …

CIS 341: Compilers 9

Page 10: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Control-flow Graphs Revisited• For the purposes of dataflow analysis, we use the control-flow graph (CFG)

intermediate form.• Recall that a basic block is a sequence of instructions such that:

– There is a distinguished, labeled entry point (no jumps into the middle of a basic block)– There is a (possibly empty) sequence of non-control-flow instructions– The block ends with a single control-flow instruction (jump, conditional branch, return,

etc.)

• A control flow graph – Nodes are blocks– There is an edge from B1 to B2 if the control-flow instruction of B1 might jump to the

entry label of B2– There are no “dangling” edges – there is a block for every jump target.

CIS 341: Compilers 10

Note: the following slides are intentionally a bit ambiguous about the exact nature of the code in the control flow graphs: at the x86 assembly level an “imperative” C-like source level at the LLVM IR level

Each setting applies the same general idea, but the exact details will differ. • LLVM IR doesn’t have “imperative” update of %uid temporaries. (The SSA structure of the LLVM

IR (by design!) makes some of these analyses simpler.)

Page 11: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Dataflow over CFGs• For precision, it is helpful to think of the “fall through” between

sequential instructions as an edge of the control-flow graph too.– Different implementation tradeoffs in practice…

CIS 341: Compilers 11

Move

Binop

If

Unop

Jump

Move

Binop

If

Unop

Jump

Basic block CFG

“Exploded” CFG

Fall-through edges

in-edges

out-edges

Instr

Page 12: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Liveness is Associated with Edges

• This is useful so that the same register can be used for different temporaries in the same statement.

• Example: a = b + 1

• Compiles to:

CIS 341: Compilers 12

Instr

Live: a, b

Live: b, d, e

Mov a, b

Add a, 1

Live: b

Live: a

Live: a (maybe)

Mov eax, eax

Add eax, 1

Register Allocate:a à eax, b à eax

Page 13: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Uses and Definitions• Every instruction/statement uses some set of variables

– i.e. reads from them

• Every instruction/statement defines some set of variables– i.e. writes to them

• For a node/statement s define:– use[s] : set of variables used by s– def[s] : set of variables defined by s

• Examples:– a = b + c use[s] = {b,c} def[s] = {a}– a = a + 1 use[s] = {a} def[s] = {a}

CIS 341: Compilers 13

Page 14: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Liveness, Formally• A variable v is live on edge e if:

There is– a node n in the CFG such that use[n] contains v, and– a directed path from e to n such that for every statement s’ on the path,

def[s’] does not contain v

• The first clause says that v will be used on some path starting from edge e.

• The second clause says that v won’t be redefined on that path before the use.

• Questions:– How to compute this efficiently?– How to use this information (e.g. for register allocation)?– How does the choice of IR affect this?

(e.g. LLVM IR uses SSA, so it doesn’t allow redefinition ⇒ simplify livenessanalysis)

CIS 341: Compilers 14

Page 15: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Simple, inefficient algorithm• “A variable v is live on an edge e if there is a node n in the CFG using

it and a directed path from e to n pasing through no def of v.”

• Backtracking Algorithm:– For each variable v…– Try all paths from each use of v, tracing backwards through the control-

flow graph until either v is defined or a previously visited node has been reached.

– Mark the variable v live across each edge traversed.

• Inefficient because it explores the same paths many times (for different uses and different variables)

CIS 341: Compilers 15

Page 16: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Dataflow Analysis• Idea: compute liveness information for all variables simultaneously.

– Keep track of sets of information about each node

• Approach: define equations that must be satisfied by any livenessdetermination.– Equations based on “obvious” constraints.

• Solve the equations by iteratively converging on a solution.– Start with a “rough” approximation to the answer– Refine the answer at each iteration– Keep going until no more refinement is possible: a fixpoint has been

reached

• This is an instance of a general framework for computing program properties: dataflow analysis

CIS 341: Compilers 16

Page 17: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Dataflow Value Sets for Liveness• Nodes are program statements, so: • use[n] : set of variables used by n• def[n] : set of variables defined by n• in[n] : set of variables live on entry to n• out[n] : set of variables live on exit from n

• Associate in[n] and out[n] with the “collected”information about incoming/outgoing edges

• For Liveness: what constraints are there among these sets?

• Clearly:in[n] ⊇ use[n]

• What other constraints?

CIS 341: Compilers 17

n

n

in[n]

out[n]

Page 18: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Other Dataflow Constraints• We have: in[n] ⊇ use[n]

– “A variable must be live on entry to n if it is used by n”

• Also: in[n] ⊇ out[n] - def[n]– “If a variable is live on exit from n, and n doesn’t

define it, it is live on entry to n”– Note: here ‘-’ means “set difference”

• And: out[n] ⊇ in[n’] if n’ ∈ succ[n]– “If a variable is live on entry to a successor

node of n, it must be live on exit from n.”

CIS 341: Compilers 18

n

in[n]

out[n]

Page 19: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Iterative Dataflow Analysis• Find a solution to those constraints by starting from a rough guess.

– Start with: in[n] = Ø and out[n] = Ø

• The guesses don’t satisfy the constraints:– in[n] ⊇ use[n]– in[n] ⊇ out[n] - def[n]– out[n] ⊇ in[n’] if n’ ∈ succ[n]

• Idea: iteratively re-compute in[n] and out[n] where forced to by the constraints.– Each iteration will add variables to the sets in[n] and out[n]

(i.e. the live variable sets will increase monotonically)

• We stop when in[n] and out[n] satisfy these equations:(which are derived from the constraints above)– in[n] = use[n] ∪ (out[n] - def[n])

– out[n] = ∪n’∈succ[n]in[n’]

CIS 341: Compilers 19

Page 20: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Complete Liveness Analysis Algorithmfor all n, in[n] := Ø, out[n] := Ørepeat until no change in ‘in’ and ‘out’

for all n

out[n] := ∪n’∈succ[n]in[n’]

in[n] := use[n] ∪ (out[n] - def[n])end

end

• Finds a fixpoint of the in and out equations.– The algorithm is guaranteed to terminate… Why?

• Why do we start with Ø?

CIS 341: Compilers 20

Page 21: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Example Liveness Analysis• Example flow graph:

CIS 341: Compilers

e = 1

if x > 0

ret xz = e * e

y = e * x

if (x & 1)

e = z e = y

1

2

3

5

7

8

def: euse:

e = 1;while(x>0) {

z = e * e;y = e * x;x = x – 1;if (x & 1) {

e = z;} else {

e = y;}

}return x;

x = x - 1

def: use: x

6

def: use: x

def: z use: e

def: yuse: e,x

def: x use: x

def: use: x

def: e use: z

def: e use: y

9

4

in:

in:

in:

in:

in:

in:

in:in:

in:

out:

out:

out:

out:

out:

out: out:

out:

Page 22: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Example Liveness AnalysisEach iteration update:out[n] := ∪n’∈succ[n]in[n’]in[n] := use[n] ∪ (out[n] - def[n])

• Iteration 1:in[2] = xin[3] = ein[4] = xin[5] = e,xin[6] = xin[7] = xin[8] = zin[9] = y

(showing only updatesthat make a change)

CIS 341: Compilers

e = 1

if x > 0

ret xz = e * e

y = e * x

if (x & 1)

e = z e = y

1

2

3

5

7

8

def: euse:

x = x - 1

def: use: x

6

def: use: x

def: z use: e

def: yuse: e,x

def: x use: x

def: use: x

def: e use: z

def: e use: y

9

4

in:

in: x

in: x

in: e,x

in: x

in: x

in: yin: z

in: e

out:

out:

out:

out:

out:

out: out:

out:

Page 23: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Example Liveness AnalysisEach iteration update:out[n] := ∪n’∈succ[n]in[n’]in[n] := use[n] ∪ (out[n] - def[n])

• Iteration 2:out[1]= xin[1] = xout[2] = e,xin[2] = e,xout[3] = e,xin[3] = e,xout[5] = xout[6] = xout[7] = z,yin[7] = x,z,yout[8] = xin[8] = x,zout[9] = xin[9] = x,y

CIS 341: Compilers

e = 1

if x > 0

ret xz = e * e

y = e * x

if (x & 1)

e = z e = y

1

2

3

5

7

8

def: euse:

x = x - 1

def: use: x

6

def: use: x

def: z use: e

def: yuse: e,x

def: x use: x

def: use: x

def: e use: z

def: e use: y

9

4

in: x

in: e,x

in: x

in: e,x

in: x

in: x,y,z

in: x,yin: x,z

in: e,x

out: x

out: e,x

out: e,x

out: x

out: x

out: x out: x

out: y,z

Page 24: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Example Liveness AnalysisEach iteration update:out[n] := ∪n’∈succ[n]in[n’]in[n] := use[n] ∪ (out[n] - def[n])

• Iteration 3:out[1]= e,xout[6]= x,y,zin[6]= x,y,zout[7]= x,y,zout[8]= e,xout[9]= e,x

CIS 341: Compilers

e = 1

if x > 0

ret xz = e * e

y = e * x

if (x & 1)

e = z e = y

1

2

3

5

7

8

def: euse:

x = x - 1

def: use: x

6

def: use: x

def: z use: e

def: yuse: e,x

def: x use: x

def: use: x

def: e use: z

def: e use: y

9

4

in: x

in: e,x

in: x

in: e,x

in: x,y,z

in: x,y,z

in: x,yin: x,z

in: e,x

out: e,x

out: e,x

out: e,x

out: x

out: x,y,z

out: e,x out: e,x

out: x,y,z

Page 25: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Example Liveness AnalysisEach iteration update:out[n] := ∪n’∈succ[n]in[n’]in[n] := use[n] ∪ (out[n] - def[n])

• Iteration 4:out[5]= x,y,zin[5]= e,x,z

CIS 341: Compilers

e = 1

if x > 0

ret xz = e * e

y = e * x

if (x & 1)

e = z e = y

1

2

3

5

7

8

def: euse:

x = x - 1

def: use: x

6

def: use: x

def: z use: e

def: yuse: e,x

def: x use: x

def: use: x

def: e use: z

def: e use: y

9

4

in: x

in: e,x

in: x

in: e,x,z

in: x,y,z

in: x,y,z

in: x,yin: x,z

in: e,x

out: e,x

out: e,x

out: e,x

out: x,y,z

out: x,y,z

out: e,x out: e,x

out: x,y,z

Page 26: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Example Liveness AnalysisEach iteration update:out[n] := ∪n’∈succ[n]in[n’]in[n] := use[n] ∪ (out[n] - def[n])

• Iteration 5:out[3]= e,x,z

Done!

CIS 341: Compilers

e = 1

if x > 0

ret xz = e * e

y = e * x

if (x & 1)

e = z e = y

1

2

3

5

7

8

def: euse:

x = x - 1

def: use: x

6

def: use: x

def: z use: e

def: yuse: e,x

def: x use: x

def: use: x

def: e use: z

def: e use: y

9

4

in: x

in: e,x

in: x

in: e,x,z

in: x,y,z

in: x,y,z

in: x,yin: x,z

in: e,x

out: e,x

out: e,x

out: e,x,z

out: x,y,z

out: x,y,z

out: e,x out: e,x

out: x,y,z

Page 27: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Improving the Algorithm• Can we do better?

• Observe: the only way information propagates from one node to

another is using: out[n] := ∪n’∈succ[n]in[n’]– This is the only rule that involves more than one node

• If a node’s successors haven’t changed, then the node itself won’t change.

• Idea for an improved version of the algorithm:– Keep track of which node’s successors have changed

CIS 341: Compilers 27

Page 28: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

A Worklist Algorithm• Use a FIFO queue of nodes that might need to be updated.

for all n, in[n] := Ø, out[n] := Øw = new queue with all nodesrepeat until w is empty

let n = w.pop() // pull a node off the queueold_in = in[n] // remember old in[n]

out[n] := ∪n’∈succ[n]in[n’]

in[n] := use[n] ∪ (out[n] - def[n])if (old_in != in[n]), // if in[n] has changed

for all m in pred[n], w.push(m) // add to worklistend

CIS 341: Compilers 28

Page 29: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

OTHER DATAFLOW ANALYSES

Zdancewic CIS 341: Compilers 29

Page 30: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Generalizing Dataflow Analyses• The kind of iterative constraint solving used for liveness analysis

applies to other kinds of analyses as well.– Reaching definitions analysis– Available expressions analysis– Alias Analysis– Constant Propagation– These analyses follow the same 3-step approach as for liveness.

• To see these as an instance of the same kind of algorithm, the next few examples to work over a canonical intermediate instruction representation called quadruples– Allows easy definition of def[n] and use[n]– A slightly “looser” variant of LLVM’s IR that doesn’t require the “static

single assignment” – i.e. it has mutable local variables– We will use LLVM-IR-like syntax

CIS 341: Compilers 30

Page 31: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Def / Use for SSA• Instructions n: def[n] use[n] description

a = op b c {a} {b,c} arithmetica = load b {a} {b} loadstore a, b Ø {b} storea = alloca t {a} Ø allocaa = bitcast b to u {a} {b} bitcasta = gep b [c,d, …] {a} {b,c,d,…} getelementptra = f(b1,…,bn) {a} {b1,…,bn} call w/returnf(b1,…,bn) Ø {b1,…,bn} void call (no return)

• Terminatorsbr L Ø Ø jumpbr a L1 L2 Ø {a} conditional branchreturn a Ø {a} return

CIS 341: Compilers 31

Page 32: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

REACHING DEFINITIONS

Zdancewic CIS 341: Compilers 33

Page 33: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Reaching Definition Analysis• Question: what uses in a program does a given variable definition

reach?

• This analysis is used for constant propagation & copy prop.– If only one definition reaches a particular use, can replace use by the

definition (for constant propagation).– Copy propagation additionally requires that the copied value still has its

same value – computed using an available expressions analysis (next)

• Input: Quadruple CFG• Output: in[n] (resp. out[n]) is the set of nodes defining some variable

such that the definition may reach the beginning (resp. end) of node n

CIS 341: Compilers 34

Page 34: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Example of Reaching Definitions• Results of computing reaching definitions on this simple CFG:

CIS 341: Compilers 35

b = a + 2

c = b * b

b = c + 1

1

2

3

return b * a4

out[1]: {1}in[2]: {1}

out[2]: {1,2}in[3]: {1,2}

out[3]: {2,3}in[4]: {2,3}

Page 35: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Reaching Definitions Step 1• Define the sets of interest for the analysis• Let defs[a] be the set of nodes that define the variable a• Define gen[n] and kill[n] as follows:• Quadruple forms n: gen[n] kill[n]

a = b op c {n} defs[a] - {n}a = load b {n} defs[a] - {n}store b, a Ø Øa = f(b1,…,bn) {n} defs[a] - {n}f(b1,…,bn) Ø Øbr L Ø Øbr a L1 L2 Ø Øreturn a Ø Ø

CIS 341: Compilers 36

Page 36: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Reaching Definitions Step 2• Define the constraints that a reaching definitions solution must satisfy.• out[n] ⊇ gen[n]

“The definitions that reach the end of a node at least include the definitions generated by the node”

• in[n] ⊇ out[n’] if n’ is in pred[n]“The definitions that reach the beginning of a node include those that reach the exit of any predecessor”

• out[n] ∪ kill[n] ⊇ in[n]“The definitions that come in to a node either reach the end of the node or are killed by it.”– Equivalently: out[n] ⊇ in[n] - kill[n]

CIS 341: Compilers 37

Page 37: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Reaching Definitions Step 3• Convert constraints to iterated update equations:

• in[n] := ∪n’∈pred[n]out[n’]

• out[n] := gen[n] ∪ (in[n] - kill[n])

• Algorithm: initialize in[n] and out[n] to Ø– Iterate the update equations until a fixed point is reached

• The algorithm terminates because in[n] and out[n] increase only monotonically– At most to a maximum set that includes all variables in the program

• The algorithm is precise because it finds the smallest sets that satisfy the constraints.

CIS 341: Compilers 38

Page 38: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

AVAILABLE EXPRESSIONS

Zdancewic CIS 341: Compilers 39

Page 39: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Available Expressions• Idea: want to perform common subexpression elimination:

– a = x + 1 a = x + 1… …b = x + 1 b = a

• This transformation is safe if x+1 means computes the same value at both places (i.e. x hasn’t been assigned).– “x+1” is an available expression

• Dataflow values:– in[n] = set of nodes whose values are available on entry to n– out[n] = set of nodes whose values are available on exit of n

CIS 341: Compilers 40

Page 40: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Available Expressions Step 1• Define the sets of values• Define gen[n] and kill[n] as follows:• Quadruple forms n: gen[n] kill[n]

a = b op c {n} - kill[n] uses[a]a = load b {n} - kill[n] uses[a]store b, a Ø uses[ [x] ]

(for all x that may equal a)br L Ø Øbr a L1 L2 Ø Øa = f(b1,…,bn) Ø uses[a]∪ uses[ [x] ]

(for all x)f(b1,…,bn) Ø uses[ [x] ] (for all x)return a Ø Ø

CIS 341: Compilers 41

Note the need for “may alias” information…

Note that functions are assumed to be impure…

Page 41: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Available Expressions Step 2• Define the constraints that an available expressions solution must

satisfy.• out[n] ⊇ gen[n]

“The expressions made available by n that reach the end of the node”

• in[n] ⊆ out[n’] if n’ is in pred[n]“The expressions available at the beginning of a node include those that reach the exit of every predecessor”

• out[n] ∪ kill[n] ⊇ in[n]“The expressions available on entry either reach the end of the node or are killed by it.”– Equivalently: out[n] ⊇ in[n] - kill[n]

CIS 341: Compilers 42

Note similarities and differences with constraints for “reaching definitions”.

Page 42: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Available Expressions Step 3• Convert constraints to iterated update equations:

• in[n] := ∩n’∈pred[n]out[n’]

• out[n] := gen[n] ∪ (in[n] - kill[n])

• Algorithm: initialize in[n] and out[n] to {set of all nodes}– Iterate the update equations until a fixed point is reached

• The algorithm terminates because in[n] and out[n] decrease only monotonically– At most to a minimum of the empty set

• The algorithm is precise because it finds the largest sets that satisfy the constraints.

CIS 341: Compilers 43

Page 43: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

GENERAL DATAFLOW ANALYSIS

Zdancewic CIS 341: Compilers 44

Page 44: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Comparing Dataflow Analyses• Look at the update equations in the inner loop of the analyses• Liveness: (backward)

– Let gen[n] = use[n] and kill[n] = def[n]

– out[n] := = ∪n’∈succ[n]in[n’]

– in[n] := gen[n] ∪ (out[n] - kill[n])

• Reaching Definitions: (forward)

– in[n] := ∪n’∈pred[n]out[n’]

– out[n] := gen[n] ∪ (in[n] - kill[n])

• Available Expressions: (forward)

– in[n] := ∩n’∈pred[n]out[n’]

– out[n] := gen[n] ∪ (in[n] - kill[n])

CIS 341: Compilers 45

Page 45: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Common Features• All of these analyses have a domain over which they solve constraints.

– Liveness, the domain is sets of variables– Reaching defns., Available exprs. the domain is sets of nodes

• Each analysis has a notion of gen[n] and kill[n]– Used to explain how information propagates across a node.

• Each analysis is propagates information either forward or backward– Forward: in[n] defined in terms of predecessor nodes’ out[]– Backward: out[n] defined in terms of successor nodes’ in[]

• Each analysis has a way of aggregating information– Liveness & reaching definitions take union (∪)– Available expressions uses intersection (∩)– Union expresses a property that holds for some path (existential)– Intersection expresses a property that holds for all paths (universal)

CIS 341: Compilers 46

Page 46: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

(Forward) Dataflow Analysis FrameworkA forward dataflow analysis can be characterized by:1. A domain of dataflow values L

– e.g. L = the powerset of all variables– Think of ℓ∈L as a property, then “x ∈ ℓ”

means “x has the property”

2. For each node n, a flow function Fn : L→ L– So far we’ve seen Fn(ℓ) = gen[n] ∪ (ℓ - kill[n])– So: out[n] = Fn(in[n])– “If ℓ is a property that holds before the node n,

then Fn(ℓ) holds after n”

3. A combining operator ⨅– “If we know either ℓ1 or ℓ2 holds on entry

to node n, we know at most ℓ1 ⨅ ℓ2”

– in[n] := ⨅n’∈pred[n]out[n’]

CIS 341: Compilers 47

n

Fn(ℓ)

n

ℓ1 ℓ2

ℓ1 ⨅ ℓ2

Page 47: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Generic Iterative (Forward) Analysisfor all n, in[n] := ⟙, out[n] := ⟙repeat until no change

for all n

in[n] := ⨅n’∈pred[n]out[n’]out[n] := Fn(in[n])

endend

• Here, ⟙ ∈ L (“top”) represents having the “maximum” amount of information.– Having “more” information enables more optimizations– “Maximum” amount could be inconsistent with the constraints.– Iteration refines the answer, eliminating inconsistencies

CIS 341: Compilers 48

Page 48: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Structure of L• The domain has structure that reflects the “amount” of information

contained in each dataflow value.• Some dataflow values are more informative than others:

– Write ℓ1 ⊑ ℓ2 whenever ℓ2 provides at least as much information as ℓ1.– The dataflow value ℓ2 is “better” for enabling optimizations.

• Example 1: for liveness analysis, smaller sets of variables are more informative.– Having smaller sets of variables live across an edge means that there are

fewer conflicts for register allocation assignments.– So: ℓ1 ⊑ ℓ2 if and only if ℓ1 ⊇ ℓ2

• Example 2: for available expressions analysis, larger sets of nodes are more informative.– Having a larger set of nodes (equivalently, expressions) available means

that there is more opportunity for common subexpression elimination.– So: ℓ1 ⊑ ℓ2 if and only if ℓ1 ⊆ ℓ2

CIS 341: Compilers 49

Page 49: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

L as a Partial Order• L is a partial order defined by the ordering relation ⊑.• A partial order is an ordered set.• Some of the elements might be incomparable.

– That is, there might be ℓ1, ℓ2 ∈ L such that neither ℓ1 ⊑ ℓ2 nor ℓ2 ⊑ ℓ1

• Properties of a partial order:– Reflexivity: ℓ ⊑ ℓ– Transitivity: ℓ1 ⊑ ℓ2 and ℓ2 ⊑ ℓ3 implies ℓ1 ⊑ ℓ2

– Anti-symmetry: ℓ1 ⊑ ℓ2 and ℓ2 ⊑ ℓ1 implies ℓ1 = ℓ2

• Examples:– Integers ordered by ≤– Types ordered by <:– Sets ordered by ⊆ or ⊇

CIS 341: Compilers 50

Page 50: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Subsets of {a,b,c} ordered by ⊆

CIS 341: Compilers 51

{a,b,c}

{a,c}

{c}

{b,c}

{a,b}

{a}

{ }

{b}

ℓ1 ⊑ ℓ2

ℓ1

ℓ2

= ⟙

= ⟘

order ⊑ is ⊆ meet ⨅ is ∩ join ⨆ is ∪

Partial order presented as a Hasse diagram.H

eigh

t is

3

Page 51: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Meets and Joins• The combining operator ⨅ is called the “meet” operation.• It constructs the greatest lower bound:

– ℓ1 ⨅ ℓ2 ⊑ ℓ1 and ℓ1 ⨅ ℓ2 ⊑ ℓ2“the meet is a lower bound”

– If ℓ ⊑ ℓ1 and ℓ ⊑ ℓ2 then ℓ ⊑ ℓ1 ⨅ ℓ2 “there is no greater lower bound”

• Dually, the ⨆ operator is called the “join” operation.• It constructs the least upper bound:

– ℓ1 ⊑ ℓ1 ⨆ ℓ2 and ℓ2 ⊑ ℓ1 ⨆ ℓ2 “the join is an upper bound”

– If ℓ1 ⊑ ℓ and ℓ2 ⊑ ℓ then ℓ1 ⨆ ℓ2 ⊑ ℓ“there is no smaller upper bound”

• A partial order that has all meets and joins is called a lattice.– If it has just meets, it’s called a meet semi-lattice.

CIS 341: Compilers 52

Page 52: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Another Way to Describe the Algorithm• Algorithm repeatedly computes (for each node n):• out[n] := Fn(in[n])

• Equivalently: out[n] := Fn(⨅n’∈pred[n]out[n’])– By definition of in[n]

• We can write this as a simultaneous update of the vector of out[n] values:– let xn = out[n]– Let X = (x1, x2, … , xn) it’s a vector of points in L

– F(X) = (F1(⨅j∈pred[1]out[j]), F2(⨅j∈pred[2]out[j]), …, Fn(⨅j∈pred[n]out[j]))

• Any solution to the constraints is a fixpoint X of F– i.e. F(X) = X

CIS 341: Compilers 53

Page 53: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Iteration Computes Fixpoints• Let X0 = (⟙,⟙, …, ⟙)• Each loop through the algorithm apply F to the old vector:

X1 = F(X0)X2 = F(X1)…

• Fk+1(X) = F(Fk(X))• A fixpoint is reached when Fk(X) = Fk+1(X)

– That’s when the algorithm stops.

• Wanted: a maximal fixpoint– Because that one is more informative/useful for performing optimizations

CIS 341: Compilers 54

Page 54: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Monotonicity & Termination• Each flow function Fn maps lattice elements to lattice elements; to be

sensible is should be monotonic:• F : L→ L is monotonic iff:

ℓ1 ⊑ ℓ2 implies that F(ℓ1) ⊑ F(ℓ2) – Intuitively: “If you have more information entering a node, then you have

more information leaving the node.”

• Monotonicity lifts point-wise to the function: F : Ln → Ln

– vector (x1, x2, … , xn) ⊑ (y1, y2, … , yn) iff xi ⊑ yi for each i

• Note that F is consistent: F(X0) ⊑ X0

– So each iteration moves at least one step down the lattice (for some component of the vector)

– … ⊑ F(F(X0)) ⊑ F(X0) ⊑ X0

• Therefore, # steps needed to reach a fixpoint is at most the height H of L times the number of nodes: O(Hn)

CIS 341: Compilers 55

Page 55: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Building Lattices?• Information about individual nodes or variables can be lifted

pointwise: – If L is a lattice, then so is { f : X → L } where f ⊑ g if and only if

f(x) ⊑ g(x) for all x ∊ X.

• Like types, the dataflow lattices are static approximations to the dynamic behavior:– Could pick a lattice based on subtyping:

– Or other information:

• Points in the lattice are sometimes called dataflow “facts”

Zdancewic CIS 341: Compilers 56

Any

Int

Neg Zero Pos

Bool

True False

<:

<:<:

:>

:> :>

:>

Aliased

Unaliased

Page 56: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

“Classic” Constant Propagation• Constant propagation can be formulated as a dataflow analysis.

• Idea: propagate and fold integer constants in one pass:x = 1; x = 1;y = 5 + x; y = 6;z = y * y; z = 36;

• Information about a single variable:– Variable is never defined.– Variable has a single, constant value.– Variable is assigned multiple values.

CIS 341: Compilers 57

Page 57: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Domains for Constant Propagation• We can make a constant propagation lattice L for one variable like

this:

• To accommodate multiple variables, we take the product lattice, with one element per variable.– Assuming there are three variables, x, y, and z, the elements of the

product lattice are of the form (ℓx, ℓy, ℓz).– Alternatively, think of the product domain as a context that maps variable

names to their “abstract interpretations”

• What are “meet” and “join” in this product lattice?• What is the height of the product lattice?

CIS 341: Compilers 58

⟙ = multiple values

⟘ = never defined

…, -3, -2, -1, 0, 1, 2, 3, …

Page 58: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Flow Functions• Consider the node x = y op z• F(ℓx, ℓy, ℓz) = ?

• F(ℓx, ⟙, ℓz) = (⟙, ⟙, ℓz) • F(ℓx, ℓy, ⟙) = (⟙, ℓy, ⟙)

• F(ℓx, ⟘, ℓz) = (⟘, ⟘, ℓz) • F(ℓx, ℓy, ⟘) = (⟘, ℓy, ⟘)

• F(ℓx, i, j) = (i op j, i, j)

• Flow functions for the other nodes are easy…• Monotonic?• Distributes over meets?

CIS 341: Compilers 59

“If either input might have multiple valuesthe result of the operation might too.”

“If either input is undefinedthe result of the operation is too.”

”If the inputs are known constants, calculate the output statically.”

Page 59: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

QUALITY OF DATAFLOW ANALYSIS SOLUTIONS

Zdancewic CIS 341: Compilers 60

Page 60: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Best Possible Solution• Suppose we have a control-flow graph.• If there is a path p1 starting from the

root node (entry point of the function) traversing the nodes n0, n1, n2, … nk

• The best possible information along the path p1 is:ℓp1 = Fnk(…Fn2(Fn1(Fn0(T)))…)

• Best solution at the output is some ℓ ⊑ ℓp for all paths p.

• Meet-over-paths (MOP) solution:

⨅p∈paths_to[n]ℓp

CIS 341: Compilers 61

e = 1

if x > 0

e = y * 5e = y * 3

e = y * x

1

2

3 4

5

Best answer here is:

F5(F3(F2(F1(T)))) ⨅ F5(F4(F2(F1(T))))

Page 61: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

What about quality of iterative solution?

• Does the iterative solution: out[n] = Fn(⨅n’∈pred[n]out[n’]) compute the MOP solution?

• MOP Solution: ⨅p∈paths_to[n] ℓp

• Answer: Yes, if the flow functions distribute over ⨅– Distributive means: ⨅i Fn(ℓi) = Fn(⨅i ℓi)

– Proof is a bit tricky & beyond the scope of this class. (Difficulty: loops in the control flow graph might mean there are infinitely many paths…)

• Not all analyses give MOP solution– They are more conservative.

CIS 341: Compilers 62

Page 62: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Reaching Definitions is MOP• Fn[x] = gen[n] ∪ (x - kill[n])

• Does Fn distribute over meet ⨅ =∪?

• Fn[x ⨅ y] = gen[n] ∪ ((x ∪ y) - kill[n]) = gen[n] ∪ ((x - kill[n]) ∪ (y - kill[n]))= (gen[n] ∪(x - kill[n])) ∪ (gen[n]∪(y - kill[n])= Fn[x] ∪ Fn[y]

= Fn[x] ⨅ Fn[y]

• Therefore: Reaching Definitions with iterative analysis always terminates with the MOP (i.e. best) solution.

CIS 341: Compilers 63

Page 63: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Constprop Iterative Solution

CIS 341: Compilers 64

z = 1z = 2

x = y + z

y = 1 y = 2

if x > 0

(⟘, ⟘, ⟘)

(⟘, ⟘, ⟘) (⟘, ⟘, ⟘)

(⟘, 2, ⟘)

(⟘, 2, 1) (⟘, 1, 2)

(⟘, 1, ⟘)

(⟘, 1, 2) ⨅ (⟘, 2, 1) = (⟘, ⟙, ⟙)

(⟙, ⟙, ⟙) iterative solution

Page 64: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

MOP Solution ≠ Iterative Solution

CIS 341: Compilers 65

z = 1z = 2

x = y + z

y = 1 y = 2

if x > 0

(⟘, ⟘, ⟘)

(⟘, ⟘, ⟘) (⟘, ⟘, ⟘)

(⟘, 2, ⟘)

(⟘, 2, 1) (⟘, 1, 2)

(⟘, 1, ⟘)

(3, 1, 2) ⨅ (3, 2, 1) = (3, ⟙, ⟙) MOP solution

Page 65: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Why not compute MOP Solution?• If MOP is better than the iterative analysis, why not compute it instead?

– ANS: exponentially many paths (even in graph without loops)

• O(n) nodes• O(n) edges• O(2n) paths*

– At each branchthere is a choiceof 2 directions

Zdancewic CIS 341: Compilers 66

* Incidentally, a similar ideacan be used to force ML / Haskelltype inference to need to constructa type that is exponentially bigin the size of the program!

Page 66: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Dataflow Analysis: Summary• Many dataflow analyses fit into a common framework.• Key idea: Iterative solution of a system of equations over a lattice of

constraints.– Iteration terminates if flow functions are monotonic.– Solution is equivalent to meet-over-paths answer if the flow functions

distribute over meet (⨅).

• Dataflow analyses as presented work for an “imperative” intermediate representation.– The values of temporary variables are updated (“mutated”) during

evaluation.– Such mutation complicates calculations– SSA = “Single Static Assignment” eliminates this problem, by introducing

more temporaries – each one assigned to only once.– Next up: Converting to SSA, finding loops and dominators in CFGs

CIS 341: Compilers 67

Page 67: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

IMPLEMENTATION

Zdancewic CIS 341: Compilers 68

See HW6: Dataflow Analysis

Page 68: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

REGISTER ALLOCATION

Zdancewic CIS 341: Compilers 69

Page 69: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Register Allocation Problem• Given: an IR program that uses an unbounded number of temporaries

– e.g. the uids of our LLVM programs

• Find: a mapping from temporaries to machine registers such that– program semantics is preserved (i.e. the behavior is the same)– register usage is maximized– moves between registers are minimized– calling conventions / architecture requirements are obeyed

• Stack Spilling– If there are k registers available and m > k temporaries are live at the same

time, then not all of them will fit into registers.– So: "spill" the excess temporaries to the stack.

Zdancewic CIS 341: Compilers 70

Page 70: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Linear-Scan Register AllocationSimple, greedy register-allocation strategy:

1. Compute liveness information: live(x)– recall: live(x)is the set of uids that are live on entry to x's definition

2. Let pal be the set of usable registers– usually reserve a couple for spill code [our implementation uses rax,rcx]

3. Maintain "layout" uid_loc that maps uids to locations– locations include registers and stack slots n, starting at n=0

4. Scan through the program. For each instruction that defines a uid x– used = {r | reg r = uid_loc(y) s.t. y ∈ live(x)}

– available = pal - used– If available is empty: // no registers available, spill

uid_loc(x) := slot n ; n = n + 1– Otherwise, pick r in available: // choose an available register

uid_loc(x) := reg r

Zdancewic CIS 341: Compilers 71

Page 71: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

For HW6• HW 6 implements two naive register allocation strategies:• no_reg_layout: spill all registers

• simple_layout: use registers but without taking liveness into account

• Your job: do "better" than these.• Quality Metric:

– registers other than rbp count positively– rbp counts negatively (it is used for spilling)– shorter code is better

• Linear scan register allocation should suffice– but… can we do better?

Zdancewic CIS 341: Compilers 72

Page 72: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

GRAPH COLORING

Zdancewic CIS 341: Compilers 73

Page 73: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Register Allocation

• Basic process:1. Compute liveness information for each temporary.2. Create an interference graph:

– Nodes are temporary variables.– There is an edge between node n and m if n is live at the same time as m

3. Try to color the graph– Each color corresponds to a register

4. In case step 3. fails, “spill” a register to the stack and repeat the whole process.

5. Rewrite the program to use registers

CIS 341: Compilers 74

Page 74: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Interference Graphs• Nodes of the graph are %uids• Edges connect variables that interfere with each other

– Two variables interfere if their live ranges intersect (i.e. there is an edge in the control-flow graph across which they are both live).

• Register assignment is a graph coloring.– A graph coloring assigns each node in the graph a color (register)– Any two nodes connected by an edge must have different colors.

• Example:

CIS 341: Compilers 75

%b1 = add i32 %a, 2

%c = mult i32 %b1, %b1

%b2 = add i32 %c, 1

%ans = mult i32 %b2, %a

return %ans;

// live = {%a}%b1 = add i32 %a, 2// live = {%a,%b1}%c = mult i32 %b1, %b1// live = {%a,%c}%b2 = add i32 %c, 1// live = {%a,%b2}%ans = mult i32 %b2, %a// live = {%ans}return %ans;

Interference Graph

%a

%b1 %b2 %c

%ans

2-Coloring of the graphred = r8yellow = r9

%a

%b1 %b2 %c

%ans

Page 75: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Register Allocation Questions• Can we efficiently find a k-coloring of the graph whenever possible?

– Answer: in general the problem is NP-complete (it requires search)– But, we can do an efficient approximation using heuristics.

• How do we assign registers to colors?– If we do this in a smart way, we can eliminate redundant MOV

instructions.

• What do we do when there aren’t enough colors/registers?– We have to use stack space, but how do we do this effectively?

CIS 341: Compilers 76

Page 76: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Coloring a Graph: Kempe’s Algorithm• Kempe [1879] provides this algorithm for K-coloring a graph.• It’s a recursive algorithm that works in three steps:• Step 1: Find a node with degree < K and cut it out of the graph.

– Remove the nodes and edges.– This is called simplifying the graph

• Step 2: Recursively K-color the remaining subgraph• Step 3: When remaining graph is colored, there must be at least one

free color available for the deleted node (since its degree was < K). Pick such a color.

CIS 341: Compilers 77

Page 77: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Example: 3-color this Graph

CIS 341: Compilers 78

Recursing Down the Simplified Graphs

Page 78: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Example: 3-color this Graph

CIS 341: Compilers 79

Assigning Colors on the way back up.

Page 79: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Failure of the Algorithm• If the graph cannot be colored, it will simplify to a graph where every

node has at least K neighbors.– This can happen even when the graph is K-colorable!– This is a symptom of NP-hardness (it requires search)

• Example: When trying to 3-color this graph:

CIS 341: Compilers 80

?

Page 80: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Spilling• Idea: If we can’t K-color the graph, we need to store one temporary

variable on the stack.• Which variable to spill?

– Pick one that isn’t used very frequently– Pick one that isn’t used in a (deeply nested) loop– Pick one that has high interference (since removing it will make the graph

easier to color)

• In practice: some weighted combination of these criteria

• When coloring: – Mark the node as spilled– Remove it from the graph– Keep recursively coloring

CIS 341: Compilers 81

Page 81: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Spilling, Pictorially• Select a node to spill• Mark it and remove it from the graph• Continue coloring

CIS 341: Compilers 82

X

Page 82: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Optimistic Coloring• Sometimes it is possible to color a node marked for spilling.

– If we get “lucky” with the choices of colors made earlier.

• Example: When 2-coloring this graph:

• Even though the node was marked for spilling, we can color it.• So: on the way down, mark for spilling, but don’t actually spill…

CIS 341: Compilers 83

X

…X

Page 83: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Accessing Spilled Registers• If optimistic coloring fails, we need to generate code to move the

spilled temporary to & from memory.• Option 1: Reserve registers specifically for moving to/from memory.

– Con: Need at least two registers (one for each source operand of an instruction), so decreases total # of available registers by 2.

– Pro: Only need to color the graph once.– Not good on X86 (especially 32bit) because there are too few registers &

too many constraints on how they can be used.

• Option 2: Rewrite the program to use a new temporary variable, with explicit moves to/from memory.– Pro: Need to reserve fewer registers.– Con: Introducing temporaries changes live ranges, so must recompute

liveness & recolor graph

CIS 341: Compilers 84

Page 84: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Example Spill Code• Suppose temporary t is marked for spilling to stack slot

[rbp+offs]

• Rewrite the program like this:t = a op b; t = a op b // defn. of t… Mov [rbp+offs], t

… x = t op c Mov t37, [rbp+offs] // use 1 of t… x = t37 op c

…y = d op t Mov t38, [rbp+offs] // use 2 of t

y = d op t38

• Here, t37 and t38 are freshly generated temporaries that replace t for different uses of t.

• Rewriting the code in this way breaks t’s live range up:t, t37, t38 are only live across one edge

CIS 341: Compilers 85

Page 85: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Precolored Nodes• Some variables must be pre-assigned to registers.

– E.g. on X86 the multiplication instruction: IMul must define %rax– The “Call” instruction should kill the caller-save registers %rax, %rcx,

%rdx. – Any temporary variable live across a call interferes with the caller-save

registers.

• To properly allocate temporaries, we treat registers as nodes in the interference graph with pre-assigned colors.– Pre-colored nodes can’t be removed during simplification.– Trick: Treat pre-colored nodes as having “infinite” degree in the

interference graph – this guarantees they won’t be simplified.– When the graph is empty except the pre-colored nodes, we have reached

the point where we start coloring the rest of the nodes.

CIS 341: Compilers 86

Page 86: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Picking Good Colors• When choosing colors during the coloring phase, any choice is

semantically correct, but some choices are better for performance.• Example:

movq t1, t2– If t1 and t2 can be assigned the same register (color) then this move is

redundant and can be eliminated.

• A simple color choosing strategy that helps eliminate such moves:– Add a new kind of “move related” edge between the nodes for t1 and t2

in the interference graph.– When choosing a color for t1 (or t2), if possible pick a color of an already

colored node reachable by a move-related edge.

CIS 341: Compilers 87

Page 87: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Example Color Choice• Consider 3-coloring this graph, where the dashed edge indicates that

there is a Mov from one temporary to another.

• After coloring the rest, we have a choice:– Picking yellow is better than red because it will eliminate a move.

CIS 341: Compilers 88

Move related edge

?

Page 88: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Coalescing Interference Graphs• A more aggressive strategy is to coalesce nodes of the interference

graph if they are connected by move-related edges.– Coalescing the nodes forces the two temporaries to be assigned the same

register.

• Idea: interleave simplification and coalescing to maximize the number of moves that can be eliminated.

• Problem: coalescing can sometimes increase the degree of a node.

CIS 341: Compilers 89

t

u t,u

a b

c

a b

c

Page 89: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Conservative Coalescing• Two strategies are guaranteed to preserve the k-colorability of the

interference graph.

• Brigg’s strategy: It's safe to coalesce x & y if the resulting node will have fewer than k neighbors (with degree ≥ k).

• George’s strategy: We can safely coalesce x & y if for every neighbor tof x, either t already interferes with y or t has degree < k.

CIS 341: Compilers 90

Page 90: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Complete Register Allocation Algorithm1. Build interference graph (precolor nodes as necessary).

– Add move related edges2. Reduce the graph (building a stack of nodes to color).

1. Simplify the graph as much as possible without removing nodes that are move related (i.e. have a move-related neighbor). Remaining nodes are high degree or move-related.

2. Coalesce move-related nodes using Brigg’s or George’s strategy.3. Coalescing can reveal more nodes that can be simplified, so repeat 2.1

and 2.2 until no node can be simplified or coalesced.4. If no nodes can be coalesced freeze (remove) a move-related edge and

keep trying to simplify/coalesce.3. If there are non-precolored nodes left, mark one for spilling, remove

it from the graph and continue doing step 2.4. When only pre-colored node remain, start coloring (popping

simplified nodes off the top of the stack).1. If a node must be spilled, insert spill code as on slide 14 and rerun the

whole register allocation algorithm starting at step 1.

CIS 341: Compilers 91

Page 91: Lecture 22 CIS 341: COMPILERScis341/current/lectures/lec22.pdf · control flow graphs:at the x86 assembly level an “imperative” C-like source level at the LLVM IR level Each setting

Last details• After register allocation, the compiler should do a peephole

optimization pass to remove redundant moves.• Some architectures specify calling conventions that use registers to

pass function arguments. – It’s helpful to move such arguments into temporaries in the function

prelude so that the compiler has as much freedom as possible during register allocation. (Not an issue on X86, though.)

CIS 341: Compilers 92