90
Pointer analysis

Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

  • View
    230

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Pointer analysis

Page 2: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Flow insensitive loss of precision

S1: l := new Cons

p := l

S2: t := new Cons

*p := t

p := t

l

t

S1

p

S2

l

t

S1

p

S2

l

t

S1

p

S2

l

t

S1

p

S2

Flow-sensitive SolnFlow-insensitive Soln(Andersen)

l

t

S1

p

S2

Page 3: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Flow insensitive loss of precision

• Flow insensitive analysis leads to loss of precision!

main() { x := &y;

...

x := &z;}

Flow insensitive analysis tells us that xmay point to z here!

• However:– uses less memory (memory can be a big bottleneck to

running on large programs)– runs faster

Page 4: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Worst case complexity of Andersen

*x = yx

a b c

y

d e f

x

a b c

y

d e f

Worst case: N2 per statement, so at least N3 for the whole program. Andersen is in

fact O(N3)

Page 5: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

New idea: one successor per node

• Make each node have only one successor.

• This is an invariant that we want to maintain.

x

a,b,c

y

d,e,f

*x = yx

a,b,c

y

d,e,f

Page 6: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

x

*x = y

y

More general case for *x = y

Page 7: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

x

*x = y

y x y x y

More general case for *x = y

Page 8: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

x

x = *y

y

Handling: x = *y

Page 9: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

x

x = *y

y x y x y

Handling: x = *y

Page 10: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

x

x = y

y

x = &y

x y

Handling: x = y (what about y = x?)

Handling: x = &y

Page 11: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

x

x = y

y x y x y

x = &y

x y x

y,…

x y

Handling: x = y (what about y = x?)

Handling: x = &y

get the same for y = x

Page 12: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Our favorite example, once more!

S1: l := new Cons

p := l

S2: t := new Cons

*p := t

p := t

1

2

3

4

5

Page 13: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Our favorite example, once more!

S1: l := new Cons

p := l

S2: t := new Cons

*p := t

p := t

l

S1

t

S2

p

l

S1

l

S1

p

l

S1

t

S2

p

l

S1,S2

tp

1

2

3

4

5

1 2

3

l

S1

t

S2

p

4

5

Page 14: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Flow insensitive loss of precision

S1: l := new Cons

p := l

S2: t := new Cons

*p := t

p := t

l

t

S1

p

S2

l

t

S1

p

S2

l

t

S1

p

S2

l

t

S1

p

S2

Flow-sensitiveSubset-based

Flow-insensitiveSubset-based

l

t

S1

p

S2

l

S1,S2

tp

Flow-insensitiveUnification-based

Page 15: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

bar() { i := &a; j := &b; foo(&i); foo(&j); // i pnts to what? *i := ...; }

void foo(int* p) { printf(“%d”,*p);}

1234

Another example

Page 16: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

bar() { i := &a; j := &b; foo(&i); foo(&j); // i pnts to what? *i := ...; }

void foo(int* p) { printf(“%d”,*p);}

i

a

j

b

p

i

a

i

a

j

b

i

a

j

b

p

i,j

a,b

p

1234

1 2

Another example

4

3

Page 17: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Steensgaard & beyond

• A well engineered implementation of Steensgaard ran on Word97 (2.1 MLOC) in 1 minute.

• One Level Flow (Das PLDI 00) is an extension to Steensgaard that gets more precision and runs in 2 minutes on Word97.

Page 18: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Correctness

Page 19: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Compilers have many bugs

• [Bug middle-end/19650] New: miscompilation of correct code• [Bug c++/19731] arguments incorrectly named in static member

specialization• [Bug rtl-optimization/13300] Variable incorrectly identified as a biv• [Bug rtl-optimization/16052] strength reduction produces wrong code• [Bug tree-optimization/19633] local address incorrectly thought to escape• [Bug target/19683] New: MIPS wrong-code for 64-bit multiply• [Bug c++/19605] Wrong member offset in inherited classes• Bug java/19295] [4.0 regression] Incorrect bytecode produced for bitwise

AND• …

Searched for “incorrect” and “wrong” in the gcc-bugs mailing list. Some of the results:

Total of 545 matches…And this is only for one month!On a mature compiler!

Page 20: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Compiler bugs cause problems

if (…) { x := …;} else { y := …;}…;

ExecCompiler

• They lead to buggy executables• They rule out having strong guarantees

about executables

Page 21: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

The focus: compiler optimizations

• A key part of any optimizing compiler

Original program Optimization

Optimized program

Page 22: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

The focus: compiler optimizations

• A key part of any optimizing compiler

• Hard to get optimizations right– Lots of infrastructure-dependent details– There are many corner cases in each optimization– There are many optimizations and they interact in

unexpected ways– It is hard to test all these corner cases and all these

interactions

Page 23: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Goals

• Make it easier to write compiler optimizations– student in an undergrad compiler course should be able

to write optimizations

• Provide strong guarantees about the correctness of optimizations– automatically (no user intervention at all)– statically (before the opts are even run once)

• Expressive enough for realistic optimizations

Page 24: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

The Rhodium work

• A domain-specific language for writing optimizations: Rhodium

• A correctness checker for Rhodium optimizations

• An execution engine for Rhodium optimizations

• Implemented and checked the correctness of a variety of realistic optimizations

Page 25: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Broader implications

• Many other kinds of program manipulators:code refactoring tools, static checkers– Rhodium work is about program analyses and

transformations, the core of any program manipulator

• Enables safe extensible program manipulators– Allow end programmers to easily and safely extend

program manipulators– Improve programmer productivity

Page 26: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Outline

• Introduction

• Overview of the Rhodium system

• Writing Rhodium optimizations

• Checking Rhodium optimizations

• Discussion

Page 27: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Rhodium system overview

Checker

Written by programmer

Written by the Rhodium team

Rhodium Execution engine

RdmOpt

RdmOpt

RdmOpt

Page 28: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Rhodium system overview

Checker

Written by programmer

Written by the Rhodium team

Rhodium Execution engine

RdmOpt

RdmOpt

RdmOpt

Page 29: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Rhodium system overview

RdmOpt

RdmOpt

RdmOpt

Checker Checker Checker

Page 30: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

CheckerChecker CheckerChecker CheckerChecker

Rhodium system overview

Exec

Compiler

Rhodium Execution engine

RdmOpt

RdmOpt

RdmOpt

if (…) { x := …;} else { y := …;}…;

Checker Checker Checker

Page 31: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

The technical problem

• Tension between:– Expressiveness– Automated correctness checking

• Challenge: develop techniques– that will go a long way in terms of expressiveness– that allow correctness to be checked

Page 32: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Solution: three techniques

AutomaticTheoremProver

Rdm Opt

Verification Task

Checker

Show that for any original program:

behavior oforiginal program

=

behavior ofoptimized program

Verification Task

Page 33: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Solution: three techniques

AutomaticTheoremProver

Rdm Opt

Verification Task

Verification Task

Page 34: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Solution: three techniques

AutomaticTheoremProver

Rdm Opt

Verification Task

Verification Task

Page 35: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Solution: three techniques

1. Rhodium is declarative– declare intent using

rules– execution engine

takes care of the rest

AutomaticTheoremProver

Rdm Opt

Page 36: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Solution: three techniques

AutomaticTheoremProver

Rdm Opt

1. Rhodium is declarative– declare intent using

rules– execution engine

takes care of the rest

Page 37: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Solution: three techniques

1. Rhodium is declarative

2. Factor out heuristics– legal transformations– vs. profitable

transformations

AutomaticTheoremProver

Rdm OptHeuristics not

affecting correctnessPart that must be reasoned

about

Page 38: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Solution: three techniques

AutomaticTheoremProver

1. Rhodium is declarative

2. Factor out heuristics– legal transformations– vs. profitable

transformations

Heuristics not affecting correctness

Part that must be reasoned

about

Page 39: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Solution: three techniques

1. Rhodium is declarative

2. Factor out heuristics

3. Split verification task– opt-dependent– vs. opt-independent

AutomaticTheoremProver

opt-dependent

opt-independent

Page 40: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Solution: three techniques

1. Rhodium is declarative

2. Factor out heuristics

3. Split verification task– opt-dependent– vs. opt-independent

AutomaticTheoremProver

Page 41: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Solution: three techniques

AutomaticTheoremProver

1. Rhodium is declarative

2. Factor out heuristics

3. Split verification task– opt-dependent– vs. opt-independent

Page 42: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Solution: three techniques

AutomaticTheoremProver

1. Rhodium is declarative

2. Factor out heuristics

3. Split verification task

Result:• Expressive language• Automated

correctness checking

Page 43: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Outline

• Introduction

• Overview of the Rhodium system

• Writing Rhodium optimizations

• Checking Rhodium optimizations

• Discussion

Page 44: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

MustPointTo analysis

c = a

a = &b

d = *c

a b

ca b

d = b

Page 45: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

MustPointTo info in Rhodium

c = a

a = &b

mustPointTo (a, b)

ca b mustPointTo (a, b)

mustPointTo (c, b)

a b

d = *c

Page 46: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

MustPointTo info in Rhodium

c = a

a = &b

d = *c

ca b mustPointTo (a, b)

mustPointTo (c, b)

c = a

a = &b

d = *c

ca b mustPointTo (a, b)

mustPointTo (c, b)

mustPointTo (a, b)a ba b mustPointTo (a, b)

Page 47: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

MustPointTo info in Rhodium

define fact mustPointTo(X:Var,Y:Var)with meaning « X == &Y ¬

c = a

a = &b

d = *c

ca b mustPointTo (a, b)

mustPointTo (c, b)

mustPointTo (a, b)a b Fact correct on edge if:

whenever program execution reaches edge, meaning of fact evaluates to true in the program state

Page 48: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Propagating facts

c = a

a = &b

d = *c

ca b mustPointTo (a, b)

mustPointTo (c, b)

mustPointTo (a, b)a b

define fact mustPointTo(X:Var,Y:Var)with meaning « X == &Y ¬

Page 49: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

a = &b

Propagating facts

c = a

a = &b

d = *c

ca b mustPointTo (a, b)

mustPointTo (c, b)

if currStmt == [X = &Y]then mustPointTo(X,Y)@outmustPointTo (a, b)a b

define fact mustPointTo(X:Var,Y:Var)with meaning « X == &Y ¬

Page 50: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Propagating facts

c = a

a = &b

d = *c

ca b mustPointTo (a, b)

mustPointTo (c, b)

mustPointTo (a, b)a b

define fact mustPointTo(X:Var,Y:Var)with meaning « X == &Y ¬

if currStmt == [X = &Y]then mustPointTo(X,Y)@out

Page 51: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

c = ac = a

Propagating facts

a = &b

d = *c

ca b mustPointTo (a, b)

mustPointTo (c, b)

if mustPointTo(X,Y)@in ÆcurrStmt == [Z = X]

then mustPointTo(Z,Y)@out

mustPointTo (c, b)

mustPointTo (a, b)a b mustPointTo (a, b)

define fact mustPointTo(X:Var,Y:Var)with meaning « X == &Y ¬

if currStmt == [X = &Y]then mustPointTo(X,Y)@out

Page 52: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Propagating facts

c = a

a = &b

d = *c

ca b mustPointTo (a, b)

mustPointTo (c, b)

mustPointTo (a, b)a b

define fact mustPointTo(X:Var,Y:Var)with meaning « X == &Y ¬

if mustPointTo(X,Y)@in ÆcurrStmt == [Z = X]

then mustPointTo(Z,Y)@out

if currStmt == [X = &Y]then mustPointTo(X,Y)@out

Page 53: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

d = *cd = *c

Transformations

c = a

a = &b

ca b mustPointTo (a, b)

mustPointTo (c, b) if mustPointTo(X,Y)@in Æ currStmt == [Z = *X]

then transform to [Z = Y]

mustPointTo (c, b)

d = b

mustPointTo (a, b)a b

define fact mustPointTo(X:Var,Y:Var)with meaning « X == &Y ¬

if mustPointTo(X,Y)@in ÆcurrStmt == [Z = X]

then mustPointTo(Z,Y)@out

if currStmt == [X = &Y]then mustPointTo(X,Y)@out

Page 54: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

d = *c

Transformations

c = a

a = &b

ca b mustPointTo (a, b)

mustPointTo (c, b) if mustPointTo(X,Y)@in Æ currStmt == [Z = *X]

then transform to [Z = Y]d = b

mustPointTo (a, b)a b

define fact mustPointTo(X:Var,Y:Var)with meaning « X == &Y ¬

if mustPointTo(X,Y)@in ÆcurrStmt == [Z = X]

then mustPointTo(Z,Y)@out

if currStmt == [X = &Y]then mustPointTo(X,Y)@out

Page 55: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Profitability heuristics

Legal transformations

Subset of legal transformations

(identified by the Rhodium rules)

(actually performed)

Profitability Heuristics

Page 56: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Profitability heuristic example 1

• Inlining

• Many heuristics to determine when to inline a function– compute function sizes, estimate code-size increase,

estimate performance benefit

– maybe even use AI techniques to make the decision

• However, these heuristics do not affect the correctness of inlining

• They are just used to choose which of the correct set of transformations to perform

Page 57: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Profitability heuristic example 2

a := ...;

b := ...;

if (...) {

a := ...;

x := a + b;

} else {

...

}

x := a + b;

• Partial redundancy elimination (PRE)

Page 58: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Profitability heuristic example 2

• Code duplication

a := ...;

b := ...;

if (...) {

a := ...;

x := a + b;

} else {

...

}

x := a + b;x := a + b;

• PRE as code duplication followed by CSE

Page 59: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Profitability heuristic example 2

• Code duplication• CSE

a := ...;

b := ...;

if (...) {

a := ...;

x := a + b;

} else {

...

}

x :=

x := a + b;

a + b; x;

• PRE as code duplication followed by CSE

Page 60: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Profitability heuristic example 2

• Code duplication• CSE• self-assignment

removal

a := ...;

b := ...;

if (...) {

a := ...;

x := a + b;

} else {

...

}

x :=

x := a + b;

x;

• PRE as code duplication followed by CSE

Page 61: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

a := ...;

b := ...;

if (...) {

a := ...;

x := a + b;

} else {

...

}

x := a + b;

Profitability heuristic example 2

Legal placements of x := a + b

Profitable placement

Page 62: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Semantics of a Rhodium opt

• Run propagation rules in a loop until there are no more changes (optimistic iterative analysis)

• Then run transformation rules to identify the set of legal transformations

• Then run profitability heuristics to determine set of transformations to perform

Page 63: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

More facts

define fact mustNotPointTo(X:Var,Y:Var)with meaning « X &Y ¬

define fact hasConstantValue(X:Var,C:Const)with meaning « X == C ¬

define fact doesNotPointIntoHeap(X:Var)with meaning « X == null Ç 9 Y:Var . X == &Y ¬

Page 64: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

More rules

if currStmt == [X = *A] Æ

mustNotPointToHeap(A)@in Æ8 B:Var . mayPointTo(A,B)@in )

mustNotPointTo(B,Y)

then mustNotPointTo(X,Y)@out

if currStmt == [Y = I + BE ] ÆvarEqualArray(X,A,J)@in ÆequalsPlus(J,I,BE)@in Æ: mayDef(X) Æ : mayDefArray(A) Æunchanged(BE)

then varEqualArray(X,A,Y)@out

Page 65: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

More in Rhodium

• More powerful pointer analyses– Heap summaries

• Analyses across procedures– Interprocedural analyses

• Analyses that don’t care about the order of statements– Flow-insensitive analyses

Page 66: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Outline

• Introduction

• Overview of the Rhodium system

• Writing Rhodium optimizations

• Checking Rhodium optimizations

• Discussion

Page 67: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Exec

Compiler

Rhodium Execution engine

RdmOpt

RdmOpt

if (…) { x := …;} else { y := …;}…;

Checker Checker

Rhodium correctness checker

RdmOpt

Checker

Page 68: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Rhodium correctness checker

Checker

CheckerRdmOpt

Page 69: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Checker

Rhodium correctness checker

Automatic theorem prover

RdmOpt

Checker

Page 70: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Rhodium correctness checker

Automatic theorem prover

definefact …

if …then transform …

if …then …

Checker

Profitability heuristics

Rhodium optimization

Page 71: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Rhodium correctness checker

Automatic theorem prover

Rhodium optimization

definefact …

if …then transform …

if …then …

Checker

Page 72: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Rhodium correctness checker

Automatic theorem prover

Rhodium optimization

definefact …

VCGen

LocalVC

LocalVC

LemmaFor any Rhodium opt:

If Local VCs are trueThen opt is correct

Proof

«¬

$

\ rt l

Checker

Opt-dependent

Opt-independent

VCGen

if …then …

if …then transform …

Page 73: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Local verification conditions

define fact mustPointTo(X,Y)with meaning « X == &Y ¬

if mustPointTo(X,Y)@in ÆcurrStmt == [Z = X]

then mustPointTo(Z,Y)@out

if mustPointTo(X,Y)@in Æ currStmt == [Z = *X]

then transform to [Z = Y]

Assume:

Propagated factis correct

Show:

All incoming facts are correct

Assume:

Original stmt and transformed stmthave same behavior

Show:

All incoming facts are correct

Local VCs (generated and proven automatically)

Page 74: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Local correctness of prop. rules

currStmt == [Z = X]

then mustPointTo(Z,Y)@out

Local VC (generated and proven automatically)

if mustPointTo(X,Y)@in Æ

define fact mustPointTo(X,Y)with meaning « X == &Y ¬

Assume:

Propagated factis correct

Show:

All incoming facts are correct

Show: « Z == &Y ¬ (out)

« X == &Y ¬ (in) Æ

out= step (in , [Z = X] )

Assume:

mustPointTo (X, Y)

mustPointTo (Z, Y)

Z := X

Page 75: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Local correctness of prop. rules

Show: « Z == &Y ¬ (out)

« X == &Y ¬ (in) Æ

out= step (in , [Z = X] )

Assume:

Local VC (generated and proven automatically)

define fact mustPointTo(X,Y)with meaning « X == &Y ¬

currStmt == [Z = X]

then mustPointTo(Z,Y)@out

if mustPointTo(X,Y)@in Æ

mustPointTo (X, Y)

mustPointTo (Z, Y)

Z := X

X Y

Z := X

in

out Z Y?

Page 76: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Local correctness of trans. rules

Local VC (generated and proven automatically)

define fact mustPointTo(X,Y)with meaning « X == &Y ¬

mustPointTo (X, Y)

Z := *X Z := Y

if mustPointTo(X,Y)@in Æ

currStmt = [Z = *X]

then transform to [Z = Y]

Assume:

Original stmt and transformed stmthave same behavior

Show:

All incoming facts are correct

step (in , [Z = Y] )

« X == &Y ¬ (in)

Show: step (in , [Z = *X] ) =

Assume:

Page 77: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Local correctness of trans. rules

step (in , [Z = Y] )

« X == &Y ¬ (in)

Show: step (in , [Z = *X] ) =

Assume:

Local VC (generated and proven automatically)

define fact mustPointTo(X,Y)with meaning « X == &Y ¬

if mustPointTo(X,Y)@in Æ

currStmt = [Z = *X]

then transform to [Z = Y]

mustPointTo (X, Y)

Z := *X Z := Y Z := *X

X Y

in

out ?

Z := Y

X Y

in

out

Page 78: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Outline

• Introduction

• Overview of the Rhodium system

• Writing Rhodium optimizations

• Checking Rhodium optimizations

• Discussion

Page 79: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Topics of Discussion

• Correctness guarantees

• Usefulness of the checker

• Expressiveness

Page 80: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Correctness guarantees

• Once checked, optimizations are guaranteed to be correct

• Caveat: trusted computing base– execution engine– checker implementation– proofs done by hand once

• Adding a new optimization does not increase the size of the trusted computing base

• Guarantees

• Usefulness

• Expressiveness

Page 81: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Usefulness of the checker

• Found subtle bugs in my initial implementation of various optimizations

define fact equals(X:Var, E:Expr)with meaning « X == E ¬

if currStmt == [X = E] then equals(X,E)@out

x := x + 1x = x + 1

equals (x , x + 1)

• Guarantees

• Usefulness

• Expressiveness

Page 82: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

if currStmt == [X = E] then equals(X,E)@outif currStmt == [X = E] Æ “X does not appear in E”then equals(X,E)@out

Usefulness of the checker

• Found subtle bugs in my initial implementation of various optimizations

define fact equals(X:Var, E:Expr)with meaning « X == E ¬ x := x + 1x = x + 1

equals (x , x + 1)

• Guarantees

• Usefulness

• Expressiveness

Page 83: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

x = x + 1x = x + 1x = *y + 1

Usefulness of the checker

• Found subtle bugs in my initial implementation of various optimizations

define fact equals(X:Var, E:Expr)with meaning « X == E ¬

if currStmt == [X = E] Æ “X does not appear in E”then equals(X,E)@out

equals (x , x + 1)equals (x , *y + 1)if currStmt == [X = E] Æ “E does not use X”then equals(X,E)@out

• Guarantees

• Usefulness

• Expressiveness

Page 84: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Rhodium expressiveness

• Traditional optimizations:– const prop and folding, branch folding, dead assignment elim,

common sub-expression elim, partial redundancy elim, partial dead assignment elim, arithmetic invariant detection, and integer range analysis.

• Pointer analyses– must-point-to analysis, Andersen's may-point-to analysis with

heap summaries

• Loop opts– loop-induction-variable strength reduction, code hoisting, code

sinking

• Array opts– constant propagation through array elements, redundant array

load elimination

• Guarantees

• Usefulness

• Expressiveness

Page 85: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Expressiveness limitations

• May not be able to express your optimization in Rhodium– opts that build complicated data structures– opts that perform complicated many-to-many

transformations (e.g.: loop fusion, loop unrolling)

• A correct Rhodium optimization may be rejected by the correctness checker – limitations of the theorem prover– limitations of first-order logic

• Guarantees

• Usefulness

• Expressiveness

Page 86: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Lessons learned (discussion)

Page 87: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Lessons learned (my answers)

• Capture structure of problem– Rhodium: flow functions, rewrite rules, prof. heuristics– Restricts the programmer, but can lead to better

reasoning abilities– Split correctness-critical code from rest

• Split verification task– meta-level vs. per-verification– between analysis tool and theorem prover– between human and theorem prover

Page 88: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Lessons learned (my answers)

• DSL design is an iterative process– Hard to see best design without trying something first

• Previous version of Rhodium was called Cobalt– Cobalt was based on temporal logic– Stepping stone towards Rhodium

Page 89: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Lessons learned (my answers)

• One of the gotchas is efficient execution– easier to reason about automatically does not always

mean easier to execute efficiently– can possibly recover efficiency with hints from users– how can you trust a complex execution engine?

• Rely on annotations?– meanings in Rhodium– May be ok, especially if annotations simply state what

the programmer is already thinking

Page 90: Pointer analysis. Flow insensitive loss of precision S1: l := new Cons p := l S2: t := new Cons *p := t p := t l t S1 p S2 l t S1 p S2 l t S1 p S2 l t

Conclusion

• Rhodium system– makes it easier to write optimizations– provides correctness guarantees– is expressive enough for realistic optimizations

• Rhodium is an example of using a DSL to allow more precise reasoning