37
Course Outline • Traditional Static Program Analysis – Theory • Compiler Optimizations; Control Flow Graphs • Data-flow Analysis – today’s class – Classic analyses and applications • Software Testing • Dynamic Program Analysis

Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Embed Size (px)

Citation preview

Page 1: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Course Outline• Traditional Static Program Analysis

– Theory• Compiler Optimizations; Control Flow Graphs

• Data-flow Analysis – today’s class

– Classic analyses and applications

• Software Testing

• Dynamic Program Analysis

Page 2: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Outline

• The four classical data-flow problems– Reaching definitions– Live variables– Available expressions– Very busy expressions

• Data-flow frameworks• Reading: Compilers: Principles, Techniques and

Tools, by Aho, Lam, Sethi and Ullman, Chapter 9.2

Page 3: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Four Classical Data-flow Problems

• Reaching definitions (Reach)• Live uses of variables (Live)• Available expressions (Avail)• Very busy expressions (VeryB)• Def-use chains built from Reach, and the dual

Use-def chains, built from Live, play role in many optimizations

• Avail enables global common subexpression elimination

• VeryB is used for conservative code motion

Page 4: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Classical Data-flow Problems

• How to formulate the analysis using data-flow equations defined on the control flow graph?

• Forward and backward data-flow problems

• May and must data-flow problems

out(i) = gen(i) (in(i) – kill(i))

in(i) = gen(i) (out(i) – kill(i))

Forward:

Backward:

Page 5: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Problem 1: Reaching Definitions

• For each CFG node n, compute the set of definitions that reach n.

i

inRD(i) = { outRD(j) | j is predecessor of i }

j: a=b+c

outRD(i)= gen(i) (inRD(i)– kill(i))

kill(j): all definitions of a gen(j): this definition of a, (a,j)

Page 6: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Example

1. x:=read()

2. y:=1

3. if x<2 then

4. y:=x*y

5. x:=x-1

6. goto 3

7. …

inRD(1) = Ø

inRD(2) = outRD (1)

inRD(3) = outRD(2) outRD(6)

inRD(4) = outRD(3)

inRD(5) = outRD(4)

inRD(6) = outRD(5)

inRD(7) = outRD(3)

outRD(1) = (inRD(1)-Dx) {(x,1)}

outRD(2) = (inRD(2)-Dy) {(y,2)}

outRD(3) = inRD(3)

outRD(4) = (inRD(4)-Dy) {(y,4)}

outRD(5) = (inRD(5)-Dx) {(x,5)}

outRD(6) = inRD(6)

Page 7: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Example1. x:=read()

2. y:=1

3. if x<2 then

4. y:=x*y

5. x:=x-1

6. goto 3

7. …

inRD(1) = Ø

inRD(2) = {(x,1)}

inRD(3) = {(x,1),(x,5),(y,2),(y,4)}

inRD(4) = {(x,1),(x,5),(y,2),(y,4)}

inRD(6) = {(x,5),(y,4)}

inRD(7) = {(x,1),(x,5),(y,2),(y,4)}

outRD(1) = {(x,1)}

outRD(2) = {(x,1), (y,2)}

outRD(3) = {(x,1),(x,5),(y,2),(y,4)}

outRD(4) = {(x,1),(x,5),(y,4)}

inRD(5) = {(x,1),(x,5),(y,4)}

outRD(5) = {(x,5),(y,4)}

inRD(6) = {(x,5),(y,4)}

Page 8: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Reaching Definitions

m1 m2 m3

j

inRD(m1)

Forward, may dataflow problem

inRD(j)

inRD(m3)inRD(m2)

Page 9: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Equivalent Equations

where:

pres(m) is the set of definitions preserved through node m

gen(m) is the set of definitions generated at node m pred(j) is the set of immediate predecessors of node j

Page 10: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Problem 2: Live Uses of Variables

• For each node n, compute the set of variables live on exit from n.

i:

outLV(i) = { inLV(j) | j is a successor of i }

inLV(i)= gen(i) (outLV(i) – kill(i))

1.x:=2; 2. y:=4; 3. x:=1; (if (y>x) then 5. z:=y; else 6. z:=y*y); 7. x:=z;

What variables are live on exit from statement 1? Statement 3?

x = y+z Q: What is gen(i)?Q: What is kill(i)?

Page 11: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Example1. x:=2

2. y:=4

3. x:=1

4. if (y>x)

5. z:=y 6. z:=y*y

7. x := z

Page 12: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Live Uses of Variables

m1 m2 m3

j outLV(j)

Backward, may dataflow problem

outLV(m1) outLV(m2) outLV(m3)

Page 13: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Equivalent equations

where:

pres(m) is the set of uses preserved through node m (roughly, correspond to variables whose defs are preserved)

gen(m) is the set of uses generated at node m succ(j) is the set of immediate successors of node j

Page 14: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Problem 3: Available Expressions

• An expression X op Y is available at node n if every path from entry to n evaluates X op Y, and after every evaluation prior to reaching n, there are NO subsequent assignments to X or Y

X op YX = …Y = …

X op YX = …Y = …

X op YX = …Y = …

n

ρ

Page 15: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Global Common Subexpressions

z=a*br=2*z

q=a*b

u=a*bz=u/2

w=a*b

Page 16: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Global Common Subexpressions

t1=a*bz=t1r=2*z

t1=a*bq=t1

u=t1z=u/2

w=a*b

Can we eliminate w=a*b?

Page 17: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Available Expressions

m1 m2 m3

j

Forward, must dataflow problem

inAE(j) = ?outAE(j) = ?gen(j) = ?kill(j) = ?

x=y+z

inAE(m1) inAE(m2) inAE(m3)

inAE(j)

Page 18: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Example

1. x = a + b

2. y = a * b

3. if y <= a + b then goto 7

4. a = a + 1

5. x = a + b

6. goto 3

7. …

Page 19: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Problem 4: Very Busy Expressions

• An expression X op Y is very busy at node n, if along EVERY path from n to the end of the program, we come to a computation of X op Y BEFORE any redefinition of X or Y.

X = …Y = …t1=X op Y

X = …Y = …t1=X op Y

X = …Y = …t1=X op Y

n

Page 20: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Very Busy Expressions

m1 m2 m3

j outVB(j)

outVB(m1) outVB(m2) outVB(m3)

Page 21: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Very Busy Expressions

where:

pres(m) is the set of expressions preserved through node m gen(m) is the set of expressions generated at node m succ(j) is the set of immediate successors of node j

Page 22: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Dataflow Problems

May Problems Must Problems

Forward Problems

Reaching Definitions

Available Expressions

Backward Problems

Live Uses of Variables

Very Busy Expressions

Page 23: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Similarities• There is a finite set, U, of data-flow facts:

– Reaching Definitions: the set of all definitions:

e.g., {(x,1),(y,2),(x,4),(y,5)}

– Available Expressions and Very Busy Expressions: the set of all arithmetic expressions e.g., { a+b,a*b,a+1}

– Live Uses: the set of all variables e.g., { x,y,z }

• The solution at a node is a subset of U (e.g., every definition either reaches node i or does not).

Page 24: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Similarities

• Equations (i.e., transfer functions) always have the form:out(i) = Fi(in(i)) = (in(i) – kill(i)) gen(i) =

(in(i) pres(i)) gen(i)

A note: what makes the 4 classical problems special is that sets pres(i) and gen(i) are constants, i.e., they do not depend on in(i)

• Set union and set intersection can be implemented as logical OR and AND respectively

Page 25: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

The worklist algorithm for data-flow Analysis: Reaching Definitions

change = true;

Initialize inRD(m) = Ø for m=2…n

inRD(1) = UNDEF

while (change) do {

change = false;

while ( j s.t. inRD(j) ≠ ((inRD (m) pres(m)) gen(m) ) {

inRD (j) = ((inRD (m) pres(m)) gen(m)

change = true;

}

}

)( jpredm

)( jpredm

Page 26: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

A Better Algorithm

/* initially all inRD sets are empty */for m := 2 to n do inRD(m) := Ø; inRD(1) = UNDEFW := {1,2,…,n} /* put every node on the worklist */while W ≠ Ø do {

remove j from W;new = {inRD(m) pres(m) gen(m) };

if new ≠ inRD (j) then {inRD (j) = new;for k succ(j) do add k to W

}

Page 27: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses
Page 28: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

An Implementation

• Use bitstring representation for sets: 1 bit position per variable definition

For each control flow graph node jpres(j) – has 0 in bit positions corresponding to definitions of variables

defined at node j– has 1 in bit positions corresponding to definitions of variables not

defined at node j

gen(j)– has 1 in bit positions corresponding to definitions at node j– has 0 in bit positions for all other definitions (i.e., definitions not

at node j)

Page 29: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Detailed Algorithm

W = empty // initialize the worklist for (i = 1; i < n+1; i++) // i varies over nodes

for (j = 1; j < m+1; j++) { // j over definitions if (k pred(i) with j gen(k)) then

{ set j bit to 1 in inRD(i); add (j,i) to W}

else { set j bit to 0 in inRD(i);}while (W not empty) do {

remove (j,i) from Wif (j pres(i)) then {

for (k succ(i)) if (j bit in inRD(k) == 0) then { set j bit to 1 in inRD(k); add (j,k) to W } }

}

First loop (for) passes gen sets to successors.

Second loop (while) performs worklist propagation.

Page 30: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Example, Bitvector Calculationi=0k=0

i<0

mod(i,3) = 0?

k:=k-1 k:=k+1

i:=i+1

exit

(i,1),(k,1)

(k,4) (k,5)

(i,6)

B1

B2

B3

B4 B5

B6

Definitions and basic blocks are given unique identifiers

Page 31: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Initializationi=0k=0

i<0

mod(i,3) = 0?

k:=k-1 k:=k+1

i:=i+1

exit

B1

B2

B3

B4 B5

B6

B1 B2 B3 B4 B5 B6

pres: 00000 11111 11111 10001 10001 01110gen: 11000 00000 00000 00100 00010 00001

Bits: i1,k1,k4,k5,i6

(i,1),(k,1)

(k,4) (k,5)

(i,6)

Page 32: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

After Initialization Loopi=0k=0

i<0

mod(i,3) = 0?

k:=k-1 k:=k+1

i:=i+1

exit

B1

B2

B3

B4 B5

B6

00000

11001

00000

00000 00000

00110

B1 B2 B3 B4 B5 B6

pres: 00000 11111 11111 10001 10001 01110gen: 11000 00000 00000 00100 00010 00001

Bits: i1,k1,k4,k5,i6

(i,1),(k,1)

(k,4) (k,5)

(i,6)

Page 33: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Propagation Loop

Worklist W = {(i1,2),(k1,2),(i6,2),(k4,6),(k5,6)}Choose (i1,2); pres(2) = 11111, so Reach(3) = 10000

and we add (i1,3) to W.Then choose (k1,2) off W and set Reach(3) = 11000 and

we add (k1,3) to W.Then choose (i6,2) off W and set Reach(3) = 11001 and

add (i6,3) to W. NowW = {(k4,6),(k5,6), (i1,3) , (k1,3), (i6,3)}Iteration continues until worklist is empty.

Page 34: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

After Steps in Previous Slidei=0k=0

i<0

mod(i,3) = 0?

k:=k-1 k:=k+1

i:=i+1

exit

B1

B2

B3

B4 B5

B6

00000

11001

11001

00000 00000

00110

(i,1),(k,1)

(k,4) (k,5)

(i,6)

Page 35: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

After Steps in Previous Slidei=0k=0

i<0

mod(i,3) = 0?

k:=k-1 k:=k+1

i:=i+1

exit

B1

B2

B3

B4 B5

B6

00000

11111

11001

00000 00000

00110

(i,1),(k,1)

(k,4) (k,5)

(i,6)

Page 36: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

After Steps in Previous Slidei=0k=0

i<0

mod(i,3) = 0?

k:=k-1 k:=k+1

i:=i+1

exit

B1

B2

B3

B4 B5

B6

00000

11111

11001

11001 11001

00110

(i,1),(k,1)

(k,4) (k,5)

(i,6)

Page 37: Course Outline Traditional Static Program Analysis –Theory Compiler Optimizations; Control Flow Graphs Data-flow Analysis – today’s class –Classic analyses

Solution (skipping some steps)i=0k=0

i<0

mod(i,3) = 0?

k:=k-1 k:=k+1

i:=i+1

exit

B1

B2

B3

B4 B5

B6

00000

11111

11111

11111 11111

10111

(i,1),(k,1)

(k,4) (k,5)

(i,6)