31
Data Flow Analysis Compiler Baojian Hua [email protected]

Data Flow Analysis Compiler Baojian Hua [email protected]

Embed Size (px)

Citation preview

Page 1: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Data Flow Analysis

CompilerBaojian Hua

[email protected]

Page 2: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Front End

source code

abstract syntax

tree

lexical analyzer

parser

tokens

IRsemantic analyzer

Page 3: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Middle End

AST translation IR1

asmother IR

and translation

translation IR2

Page 4: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Optimizations

AST translation IR1

asmother IR

and translation

translation IR2

opt

optopt

opt

opt

Page 5: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

General Scheme for Optimization Analysis

control flow, data flow, dependency, …

to obtain conservative static knowledge of the program being optimized

approximation of the dynamic Rewriting

rewrite the program dependent on the knowledge

obtained above

IR

IR’

staticinformation

analysis

rewriting

Page 6: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

“Conservative Static”

Cjump (x==5? L1: L2)

y = 1 y = 2

print (y)

Can we substitute y with the value 2?

This amounts to prove that x is always equal to 5!

Suppose x is an input from user, it’s impossible to know it’s value statically. So one must be conservative to use the static knowledge.

Page 7: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Liveness Analysis

Page 8: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Motivation Low level IRs assume an infinite number of a

bstract “registers” good for code generations but bad for execution on a real machine

machine has a finite number of registers so how to leverage this?

The goal of register allocation (optimization) is to put infinite variables into a few registers need liveness analysis

Page 9: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Example

Consider this TAC: Three variables: a, b, and c.

And assume that the target machine has only one register: r.

Is it possible to put all three variables “a”, “b” and “c” in register “r”?

a = 1

b = a + 2

c = b + 3

return c

Page 10: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Example

Calculate which variable is “live” at a given program point.

{c}

{b}

{a}

The “liveness” information gives live ranges.

Live ranges don’t overlap, thus all three variables can be put into one reg’.

Consider this TAC:

a = 1

b = a + 2

c = b + 3

return c

Page 11: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

ExampleRegister allocation:

a => r

b => r

c => r

{c}

{b}

{a}

Code rewriting:

r = 1

r = r + 2

r = r + 3

return r

Consider this TAC:

a = 1

b = a + 2

c = b + 3

return c

Page 12: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Data Flow Equations for Liveness Inside basic blocks (backward):in = use[n] \/ (out - def[n])

// Example:a = 1

b = a + 2

c = b + 3

return c

// Example:a = 1

b = a + 2

c = b + 3

return a + c

int

out

Page 13: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

For general CFG

Equations: in[n] = use[n]\/(out[n]-def[n]) out[n] = \/s∈succ[n] in[s] Fixpoint algorithm

init in out sets with {} loop until no set changes use[n]

def[n]

in[n]

out[n]

Page 14: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Examplein/out

in/out in/out in/out

in/out

1 {} {}

{} {} {} {a} …

2 {} {}

{a} {}

{a} {b,c} …

3 {} {}

{b,c} {} {b,c}{b} …

4 {} {}

{b} {}

{b}{a,c} …

5 {} {}

{a} {a}

{a}{a,c} …

6 {} {}

{c} {}

{c} {} …

a = 0

b = a + 1

c = c + b

a = b * 2

a<N

return c

1

2

3

4

5

6node 1 2 3 4 5 6

def {a}

{b}

{c} {a} {} {}

use {}

{a}

{b, c}

{b} {a, N}

{c}

{a,c}{b,c}{b,c}{a,c}{a,c}

Final live_out

Loop the nodes with order: 1, 2, 3, 4, 5, 6

{c}

in[n] = use[n] \/ (out[n]-def[n])

out[n] = \/s\in succ[n] in[s]

Page 15: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Interference Graph

a = 0

b = a + 1

c = c + b

a = b * 2

a<N

return c

1

2

3

4

5

6

{a,c}{b,c}{b,c}{a,c}{a,c}

Final live_out

{c}

For any two variable x and y, if they are live simultaneously, then draw an (undirected) edge x->y.

a

b c

Page 16: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Speeding-up the analysis Ordering the nodes

for liveness analysis: reverse top-sort order You do this in lab 5

Once a variable Careful selection of set representation

Careful data structure engineering Say: bit-vector

Basic block You do this in lab 5

Page 17: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Basic Blocks Step 1: calculate def and use for each basic

block b one pass backward calculation

Step 2: do liveness analysis on each block just as discussed above

Step 3: calculate liveness information for each statement in each block one pass backward calculation

Page 18: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Exampleout/in out/in out/in out/in

3 {} {}

{} {c} {} {c} {} {c}

2 {} {}

{c} {a,c} {a,c} {a,c} {a,c} {a,c}

1 {} {}

{a,c} {c} {a,c}{c} {a,c} {c}

a = 0

b = a + 1c = c + ba = b * 2

a<N

return c

1

2

3block 1 2 3

def {a}

{a,b,c} {}

use {} {a,c} {c}

This set does NOT contain variable “b”. Why?

Blocks are reverse topo-sort ordered

live_out for each block

{a,c}

{a,c}

{}

Backward calculation of live_out for each statement.

{a,c}

{b,c}

{b,c}

Page 19: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Reaching Definition

Page 20: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Reaching Definition

a = 0

b = a + 1c = c + ba = b * 2

a<N

return c

1

2

3

E.g., can we substitute the variable a with 0?

The problem: at any program point, we’d like to know where the value of a variable x is defined.

If so, we are doing the so-called constant propagation optimization.

Page 21: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Implementation

a = 0

b = a + 1c = c + ba = b * 2

a<N

return c

1

2

3

Number each definition:

Here we number the four definition with 5, 6, 7, 8, which have no special meaning, just:

1. they are different from the block

number, and

2. they are all unique.)

5:

6:7:8:

Page 22: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Equations

a = 0

b = a + 1c = c + ba = b * 2

a<N

return c

1

2

3

Calculate def and kill for each block, based on the equation

for statement:

def[d: x=…] = {d}

kill[d: x= …] = defs(x)-{d}

5:

6:7:8:

def[1] = {5}

kill[1] = {8}

def[2] = {6,7,8}

kill[2] = {5}def[3] = {}

kill[3] = {}

Page 23: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Data Flow Equation

Forward calculation: in[b] = \/q∈ pred(b) out[b] out[b] = def[b]\/(in[b]-kill[b])

Page 24: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Fixpoint algorithm

a = 0

b = a + 1c = c + ba = b * 2

a<N

return c

1

2

3

5:

6:7:8:

block 1 2 3

def {5}

{6,7,8} {}

kill {8}

{5} {}

in/out in/out in/out in/out

1 {} {}

{} {5} {} {5}

2 {} {}

{5} {6,7,8} {5,6,7,8} {6,7,8}

3 {} {}

{6,7,8} {6,7,8}

{6,7,8} {6,7,8}in[b] = \/q∈ pred(b) out[b]

out[b] = def[b]\/(in[b]-kill[b])

{}

{5,6,7,8}{5,6,7,8}{5,6,7,8}{6,7,8}

{6,7,8}

Page 25: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Constant Propagation

a = 0

b = a + 1c = c + ba = b * 2

a<N

return c

1

2

3

5:

6:7:8:

{}

{5,6,7,8}{5,6,7,8}{5,6,7,8}{6,7,8}

{6,7,8}

Can we substitute the variable a here with the constant “0”?

No! Because there are two definitions for “a” which may reach this point: 5 and 8.

Page 26: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Available Expressions

Page 27: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Available Expressions

a = 0

b = a + 1c = c + ba = a + 1

a<N

return c

1

2

3

E.g., has the right-side expression “a+1” been calculated and thus available here?

So the second calculation can be avoided!

The problem: at a given program point, we’d like to know whether or not the value of an expression e has been calculated and is also available.

1. The expression e must be calculated on every path to the point, and

2. variables used in e must not been redefined after the initial calculation.

Page 28: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Implementation

a = 0

b = a + 1c = c + ba = a + 1

a<N

return c

1

2

3

Calculate gen and kill for each block, based on the equation

for statement. (Tiger table 17.4)

gen[1] = {}

kill[1] = {a+1}

gen[2] = {}

kill[2] = ALL

gen[3] = {}

kill[3] = {}

All possible expressions:

ALL={a+1, c+b}

Page 29: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Implementation

a = 0

b = a + 1c = c + ba = a + 1

a<N

return c

1

2

3

Calculate in/out for each block, based on the fixpoint algorithm.

gen[1] = {}

kill[1] = {a+1}

gen[2] = {}

kill[2] = ALL

gen[3] = {}

kill[3] = {}

All available expressions:

ALL={a+1, c+b}in/out in/out in/out

1 {} ALL {} {}

2 ALL ALL {} {}

3 ALL ALL {} {}

Page 30: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Implementation

a = 0

b = a + 1c = c + ba = a + 1

a<N

return c

1

2

3

Calculate in/out for each statement, based on the in/out for each block.

{}

All available expressions:

ALL={a+1, c+b}in/out in/out in/out

1 {} ALL {} {}

2 ALL ALL {} {}

3 ALL ALL {} {}

{}

{}{a+1}{a+1}{}{}

{}{}

Page 31: Data Flow Analysis Compiler Baojian Hua bjhua@ustc.edu.cn

Common Sub-expression Elimination (CSE)

a = 0

b = a + 1c = c + ba = a + 1

a<N

return c

1

2

3

E.g., has the right-side expression “a+1” been calculated and thus available here?

So the second calculation can be avoided!After the available expression

analysis, we know “a+1” is available, so the second calculation can be omitted!

return c

1

2

3

{}

{}

{}{a+1}{a+1}{}{}

{}{}

b

But with which variable the expression “a+1” should be substituted? We need to do reaching expression analysis... (Read the text and do homework!)