Global Redundancy Elimination: Computing Available Expressions Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled

Global Redundancy Elimination:Computing Available Expressions

Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved.

Students enrolled in Comp 512 at Rice University have explicit permission to make copies of these materials for their personal use.

Faculty from other educational institutions may use these materials for nonprofit educational purposes, provided this copyright notice is preserved.

Comp 512Spring 2011

1COMP 512, Rice University

COMP 512, Rice University

2

Computing Available Expressions

For each block b

• Let AVAIL(b) be the set of expressions available on entry to b

• Let NKILL(b) be the set of expression not killed in b

• Let DEF(b) be the set of expressions defined in b and not subsequently killed in b

Now, AVAIL(b) can be defined as:

AVAIL(b) = xpred(b) (DEF(x) (AVAIL(x) NKILL(x) ))

preds(b) is the set of b’s predecessors in the control-flow graph

This system of simultaneous equations forms a data-flow problem Solve it with a data-flow algorithm


3

Using Available Expressions for GCSE

The Big Picture

1. block b, compute DEF(b) and NKILL(b)

2. block b, compute AVAIL(b)

3. block b, value number the block starting from AVAIL(b)

4. Replace expressions in AVAIL(b) with references

Expressions not killed in b

Expressions defined in b


4

First step is to compute DEF & NKILL


assume a block b with operations o1, o2, …, ok

KILLED DEF(b)

for i = k to 1assume oi is “x y + z”if (y KILLED(b)) and (z KILLED(b)) then

add “y + z” to DEF(b)add x to KILLED(b)

NOTKILLED(b) { all expressions }For each expression e

for each variable v eif v KILLED(b) then

NOTKILLED(b) NOTKILLED(b) - e

}O(k) steps

}O(N) steps

N is # of operations

*

Many data-flow problems have initial information that costs less to compute


5


The worklist iterative algorithm

Worklist { all blocks, bi }

while (Worklist )remove a block b from Worklist recompute AVAIL(b ) as


if AVAIL(b ) changed thenWorklist Worklist successors(b )

•Finds fixed point solution to equation for AVAIL

•That solution is unique

•Identical to “meet over all paths” solution}How do we know these things?

*


6

Data-flow Analysis

Definition

Data-flow analysis is a collection of techniques for compile-time reasoning about the run-time flow of values

• Almost always involves building a graph Problems are trivial on a basic block Global problems control-flow graph (or derivative) Whole program problems call graph (or derivative)

• Usually formulated as a set of simultaneous equations Sets attached to nodes and edges Lattice (or semilattice) to describe values

• Desired result is usually meet over all paths solution “What is true on every path from the entry?” “Can this happen on any path from the entry?” Related to the safety of optimization

Flow graph

Data-flow problem


7

Data-flow Analysis

Limitations

1. Precision – “up to symbolic execution” Assume all paths are taken

2. Solution – cannot afford to compute MOP solution Large class of problems where MOP = MFP= LFP

Not all problems of interest are in this class

3. Arrays – treated naively in classical analysis Represent whole array with a single fact

4. Pointers – difficult (and expensive) to analyze Imprecision rapidly adds up Need to ask the right questions

SummaryFor scalar values, we can quickly solve simple problems

Good news:

Simple problems can carry us pretty far

*


8

Data-flow Analysis

Semilattice

A semilattice is a set L and a meet operation such that,

a, b, & c L :

1. a a = a

2. a b = b a3. a (b c) = (a b) c

imposes an order on L, a, b, & c L :

1. a ≥ b a b = b

2. a > b a ≥ b and a ≠ b

A semilattice has a bottom element, denoted 1. a L, a =

2. a L, a ≥


9

Data-flow Analysis

How does this relate to data-flow analysis?

• Choose a semilattice to represent the facts

• Attach a meaning to each a L

Each a L is a distinct set of known facts

• With each node n, associate a function fn : L L

fn models behavior of code in block corresponding to n

• Let F be the set of all functions that the code might generate

Example — AVAIL

• Semilattice is (2E,), where E is the set of all expressions & is Set are bigger than |variables|, is

• For a node n, fn has the form fn(x) = Dn (x Nn)

Where Dn is DEF(n) and Nn is NKILL(n)


10

Concrete Example: Available Expressions

m a + bn a + b

A

p c + dr c + d

B

y a + bz c + d

G

q a + br c + d

C

e b + 18s a + bu e + f

D e a + 17t c + du e + f

E

v a + bw c + dx e + f

F

E = {a+b,c+d,e+f,a+17,b+18}

2E is the set of all subsets of E

2E = [ {a+b,c+d,e+f,a+17,b+18}, {a+b,c+d,e+f,a+17}, {a+b,c+d,e+f,b+18}, {a+b,c+d,a+17,b+18}, {a+b,e+f,a+17,b+18}, {c+d,e+f,a+17,b+18}, {a+b,c+d,e+f}, {a+b,c+d,b+18}, {a+b,c+d,a+17}, {a+b,e+f,a+17}, {a+b,e+f,b+18},{a+b,a+17,b+18}, {c+d,e+f,a+17}, {c+d,e+f,b+18}, {c+d,a+17,b+18},{e+f,a+17,b+18}, {a+b,c+d},{a+b,e+f},{a+b,a+17}, {a+b,b+18},{c+d,e+f},{c+d,a+17}, {c+d,b+18},{e+f,a+17},{e+f,b+18}, {a+17,b+18},{a+b}, {c+d}, {e+f}, {a+17}, {b+18}, {} ]


11

Concrete Example: Available Expressions

The Lattice

{ }

{a+b} {c+d} {e+f} {a+17} {b+18}

{a+b,c+d} {a+b,a+17} {c+d,e+f} {c+d,b+18} {e+f,b+18} {a+b,e+f} {a+b,b+18} {c+d,a+17} {e+f,a+17} {a+17,b+18}

{a+b,c+d,e+f} {a+b,c+d,b+18} {a+b,c+d,a+17} {a+b,e+f,a+17} {a+b,e+f,b+18} {a+b,a+17,b+18} {c+d,e+f,a+17} {c+d,e+f,b+18} {c+d,a+17,b+18}

{e+f,a+17,b+18},

{a+b,c+d,e+f,a+17} {a+b,c+d,e+f,b+18} {a+b,c+d,a+17,b+18} {a+b,e+f,a+17,b+18} {c+d,e+f,a+17,b+18}

{a+b,c+d,e+f,a+17,b+18},

*

Comparability(transitive)

meet


12

Data-flow Analysis

What does this have to do with the iterative algorithm?

We can use a lattice-theoretic formulation to prove

• Termination – it halts on an instance of AVAIL

• Correctness – it produces the desired result for AVAIL

• Complexity – it runs pretty quickly (d(CFG)+3 passes)

Worklist { all blocks, bi }

while (Worklist )remove a block bi from Worklist recompute AVAIL(bi ) as


if AVAIL(bi ) changed thenWorklist Worklist successors(bi )


13

Data-flow Analysis

Termination

• If every fn F is monotone, i.e., f(xy) ≤ f(x) f(y), and

• If the lattice is bounded, i.e., every descending chain is finite Chain is sequence x1, x2, …, xn where xi L, 1 ≤ i ≤ n

xi > xi+1, 1 ≤ i < n chain is descending

Then

• The iterative algorithm must halt on an instance of the problem

• Set at each block can only change a finite number of times

• Any finite semilattice is bounded

• Some infinite semilattices are bounded


14

Data-flow Analysis

Correctness

• Does the iterative algorithm compute the desired answer?

Admissible Function Spaces

1. f F, x,y L, f (xy) = f (x) f (y)

2. fi F such that x L, fi(x) = x

3. f,g F h F such that h(x ) = f (g(x))

4. x L, a finite subset H F such that x = f H f ()

If F meets these four conditions, then the problem (L,F,) has a unique fixed point solution LFP = MFP = MOP order of evaluation does not matter

Not distributive answer may not be unique

*


15

Data-flow Analysis

Complexity

• For a problem with an admissible function space & a bounded semilattice,

• If the functions all meet the rapid condition, i.e.,

f,g F, x L, f (g()) ≥ g() f (x) x

then, a round-robin, reverse-postorder iterative algorithm will halt in d(G)+3 passes over a graph G

d(G) is the loop-connectedness of the graph w.r.t a DFST Maximal number of back edges in an acyclic path Several studies suggest that, in practice, d(G) is small

(<3) For most CFGs, d(G) is independent of the specific DFST

Sets stabilize in two passes around a

loop

Each pass does O(E ) meets & O(N ) other operations

*


16

Data-flow analysis

What does this mean?

• Reverse postorder Number the nodes in a postorder traversal Reverse the order

• Round-robin iterative algorithm Visit all the nodes in a consistent order (RPO) Do it again until the sets stop changing

So, these conditions are easily met Admissible framework, rapid function space Round-robin, reverse-postorder, iterative algorithm

The analysis runs in (effectively) linear time


17

Data-flow Analysis

How do we use these results?

Prove that data-flow framework is admissible & rapid Its just algebra Most (but not all) global data-flow problems are rapid This is a property of F

Code up the iterative algorithm World’s simplest data-flow algorithm Other versions (worklist) have similar behavior

This lets us ignore most of the other data-flow algorithms in 512


18

Global CSE (replacement step)

Managing the name space

Need a unique name e AVAIL(b)

1. Can generate them as replacements are done (Fortran H)

2. Can compute a static mapping

3. Can encode value numbers into names (Briggs 94)

Strategy

1. This works; it is the classic method

2. Fast, but limits replacement to textually identical expressions

3. Requires more analysis (VN), but yields more CSEs

Assume, w.l.o.g., solution 2


19


The Big Picture

1. Build a control-flow graph

2. Gather the initial (local) data — DEF(b) & NKILL(b)

3. Propagate information around the graph, evaluating the equation

4. Post-process the information to make it useful (if needed)

Most data-flow problems are solved, essentially, this way

Documents

Global Redundancy Elimination: Computing Available Expressions Copyright 2011, Keith D. Cooper & Linda Torczon, all rights reserved. Students enrolled