Pointer Analysis Lecture 2 G. Ramalingam Microsoft Research, India

Pointer AnalysisLecture 2

G. RamalingamMicrosoft Research, India

Andersen’s Analysis

• A flow-insensitive analysis – computes a single points-to solution valid at

all program points– ignores control-flow – treats program as a set

of statements– equivalent to merging all vertices into one

(and applying algorithm A)– equivalent to adding an edge between every

pair of vertices (and applying algo. A)

– a solution R such thatR IdealMayPT(u) for every vertex u

Example(Flow-Sensitive Analysis)

x = &a;

y = x;

x = &b;

z = x;

1

2

3

x = &a

y = x

4

5

x = &b

z = x

Example:Andersen’s Analysis

x = &a;

y = x;

x = &b;

z = x;

1

2

3

x = &a

y = x

4

5

x = &b

z = x

Andersen’s Analysis

• Strong updates?

• Initial state?

Why Flow-Insensitive Analysis?

• Reduced space requirements– a single points-to solution

• Reduced time complexity– no copying

• individual updates more efficient

– no need for joins– number of iterations?– a cubic-time algorithm

• Scales to millions of lines of code– most popular points-to analysis

Andersen’s AnalysisA Set-Constraints Formulation

• Compute PTx for every variable x

Statement Constraint

x = null

x = &y

x = y

x = *y

*x = y

Steensgaard’s Analysis

• Unification-based analysis• Inspired by type inference

– an assignment “lhs := rhs” is interpreted as a constraint that lhs and rhs have the same type

– the type of a pointer variable is the set of variables it can point-to

• “Assignment-direction-insensitive”– treats “lhs := rhs” as if it were both “lhs := rhs”

and “rhs := lhs”

• An almost-linear time algorithm– single-pass algorithm; no iteration required

Example:Andersen’s Analysis

x = &a;

y = x;

y = &b;

b = &c;

1

2

3

x = &a

y = x

4

5

y = &b

b = &c

Example:Steensgaard’s Analysis

x = &a;

y = x;

y = &b;

b = &c;

1

2

3

x = &a

y = x

4

5

y = &b

b = &c

Steensgaard’s Analysis

• Can be implemented using Union-Find data-structure

• Leads to an almost-linear time algorithm

Exercise

x = &a;

y = x;

y = &b;

b = &c;

*x = &d;

May-Point-To Analyses

Ideal-May-Point-To

Algorithm A

Andersen’s

Steensgaard’s

more efficient / less precise

???


Ideal Points-To Analysis:Definition Recap

• A sequence of states s1s2 … sn is said to be an execution (of the program) iff – s1 is the Initial-State

– si | si+1 for 1 <= I < n

• A state s is said to be a reachable state iff there exists some execution s1s2 … sn is such that sn = s.

• RS(u) = { s | (u,s) is reachable }• IdealMayPT (u) = { (p,x) | $ s Î RS(u). s(p) == x }• IdealMustPT (u) = { (p,x) | " s Î RS(u). s(p) == x }

Does Algorithm A Compute The Most Precise Solution?

Ideal <-> Algorithm A

• Abstract away correlations between variables– relational analysis vs.– independent attribute

x: &b y: &x

x: &y y: &z

x: {&y,&b} y: {&x,&z}

a

g

x: &y y: &x

x: &b y: &z

x: &y y: &z

x: &b y: &x

Does Algorithm A Compute The Most Precise Solution?

Is The Precise Solution Computable?

• Claim: The set RS(u) of reachable concrete states (for our language) is computable.

• Note: This is true for any collecting semantics with a finite state space.

Precise Points-To Analysis:Decidability

• Corollary: Precise may-point-to analysis is computable.

• Corollary: Precise (demand) may-alias analysis is computable.– Given ptr-exp1, ptr-exp2, and a program point u, identify

if there exists some reachable state at u where ptr-exp1 and ptr-exp2 are aliases.

• Ditto for must-point-to and must-alias

• … for our restricted language!

Precise Points-To Analysis:Computational Complexity

• What’s the complexity of the least-fixed point computation using the collecting semantics?

• The worst-case complexity of computing reachable states is exponential in the number of variables.– Can we do better?

• Theorem: Computing precise may-point-to is PSPACE-hard even if we have only two-level pointers.


Ideal-May-Point-To

Algorithm A

Andersen’s

Steensgaard’s




Precise Points-To Analysis: Caveats

• Theorem: Precise may-alias analysis is undecidable in the presence of dynamic memory allocation.– Add “x = new/malloc ()” to language– State-space becomes infinite

• Digression: Integer variables + conditional-branching also makes any precise analysis undecidable.


Ideal (no Int, no Malloc)

Algorithm A

Andersen’s

Steensgaard’s

Ideal (with Int, with Malloc)

Ideal (with Int)

Ideal (with Malloc)

Dynamic Memory Allocation

• s: x = new () / malloc ()• Assume, for now, that allocated object stores one

pointer– s: x = malloc ( sizeof(void*) )

• Introduce a pseudo-variable Vs to represent objects allocated at statement s, and use previous algorithm– treat s as if it were “x = &Vs”

– also track possible values of Vs

– allocation-site based approach

• Key aspect: Vs represents a set of objects (locations), not a single object– referred to as a summary object (node)

Dynamic Memory Allocation:Example

x = new;

y = x;

*y = &b;

*y = &a;

1

2

3

x = new

y = x

4

5

*y = &b

*y = &a

Dynamic Memory Allocation:Object Fields

• Field-sensitive analysisclass Foo { A* f; B* g;}s: x = new Foo()

x->f = &b;

x->g = &a;

Dynamic Memory Allocation:Object Fields

• Field-insensitive analysisclass Foo { A* f; B* g;}s: x = new Foo()

x->f = &b;

x->g = &a;

Other Aspects

• Context-sensitivity• Indirect (virtual) function calls and

call-graph construction• Pointer arithmetic• Object-sensitivity

Andersen’s Analysis:Further Optimizations and Extensions

• Fahndrich et al., Partial online cycle elimination in inclusion constraint graphs, PLDI 1998.

• Rountev and Chandra, Offline variable substitution for scaling points-to analysis, 2000.

• Heintze and Tardieu, Ultra-fast aliasing analysis using CLA: a million lines of C code in a second, PLDI 2001.

• M. Hind, Pointer analysis: Haven’t we solved this problem yet?, PASTE 2001.

• Hardekopf and Lin, The ant and the grasshopper: fast and accurate pointer analysis for millions of lines of code, PLDI 2007.

• Hardekopf and Lin, Exploiting pointer and location equivalence to optimize pointer analysis, SAS 2007.

• Hardekopf and Lin, Semi-sparse flow-sensitive pointer analysis, POPL 2009.

Context-Sensitivity Etc.

• Liang & Harrold, Efficient computation of parameterized pointer information for interprocedural analyses. SAS 2001.

• Lattner et al., Making context-sensitive points-to analysis with heap cloning practical for the real world, PLDI 2007.

• Zhu & Calman, Symbolic pointer analysis revisited. PLDI 2004.

• Whaley & Lam, Cloning-based context-sensitive pointer alias analysis using BDD, PLDI 2004.

• Rountev et al. Points-to analysis for Java using annotated constraints. OOPSLA 2001.

• Milanova et al. Parameterized object sensitivity for points-to and side-effect analyses for Java. ISSTA 2002.

Applications

• Compiler optimizations

• Verification & Bug Finding– use in preliminary phases– use in verification itself

Dynamic Memory Allocation:Summary Object Update

4

5

*y = &a

Abstract Transformers:Weak/Strong Update

AS[stmt] : AbsDataState -> AbsDataState

AS[ *x = y ] s =

s[z s(y)] if s(x) = {z}

s[z1 s(z1) s(y)] if s(x) = {z1, …, zk}

[z2 s(z2) s(y)] (where k > 1)

… [zk s(zk) s(y)]

Correctness & Precision

• How can we formally reason about the correctness & precision of abstract transformers?

• Can we systematically derive a correct abstract transformer?

Enter: The French Recipe(Abstract Interpretation)

Abstract Domain

• A semi-lattice (A, )• Transfer Functions• For every statement

st,AS[st] : A -> A

Concrete (Collecting) Domain

• A semi-lattice (2C, )• Transfer Functions• For every statement st,

CS*[st] : 2C -> 2C

2Data-State 2Var x Var’

Concrete Domain• Concrete states: C• Semantics: For every statement st,

CS[st] : C -> C

g

a

Points-To Analysis(Abstract Interpretation)

a(Y) = { (p,x) | exists s in Y. s(p) == x }

RS(u)

2Data-State 2Var x Var’

IdealMayPT(u)

MayPT(u)

Íaa

IdealMayPT (u) = a ( RS(u) )

Approximating Transformers:Correctness Criterion

C A

correctly approximated by

c1

c2

f

a1

a2

f#

correctly approximated by

c is said to be correctly approximated by a

iffa(c) Í a

Approximating Transformers:Correctness Criterion

C A

c1

c2

f

a1

a2

f#

concretizationg

abstractiona

requirement:f#(a1) ≥ a (f( g(a1))

Concrete Transformers

• CS[stmt] : Data-State -> Data-State

• CS[ x = y ] s = s[x s(y)]• CS[ x = *y ] s = s[x s(s(y))]• CS[ *x = y ] s = s[s(x) s(y)]• CS[ x = null ] s = s[x null]

• CS*[stmt] : 2Data-State -> 2Data-State

• CS*[st] X = { CS[st]s | s Î X }

Abstract Transformers

• AS[stmt] : AbsDataState -> AbsDataState

• AS[ x = y ] s = s[x s(y)]• AS[ x = null ] s = s[x {null}]• AS[ x = *y ] s = s[x s*(s(y))] where s*({v1,…,vn}) = s(v1) È … È s(vn)

• AS[ *x = y ] s = ???

Algorithm A: TranformersWeak/Strong Update

x: {&y} y: {&x,&z} z: {&a}

x: &b y: &x z: &a

x: &y y: &z z: &bx: {&y,&b} y: {&x,&z} z: {&a,&b}

x: &y y: &x z: &a

x: &y y: &z z: &a

*y = &b;f#*y = &b;f

a

g

Algorithm A: TranformersWeak/Strong Update

x: {&y} y: {&x,&z} z: {&a}

x: &y y: &b z: &a

x: &y y: &b z: &ax: {&y} y: {&b} z: {&a}

x: &y y: &x z: &a

x: &y y: &z z: &a

*x = &b;f#*x = &b;f

a

g

Dynamic Memory Allocation:Summary Object Update

4

5

*y = &a

Documents

Pointer Analysis Lecture 2 G. Ramalingam Microsoft Research, India