35
Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008.

Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Embed Size (px)

DESCRIPTION

What is Pointer Analysis? Pointer analysis is the mechanism of statically finding out possible run-time values of a pointer.

Citation preview

Page 1: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Pointer Analysis.

Rupesh Nasre.

Advisor: Prof R Govindarajan.

Apr 05, 2008.

Page 2: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Outline.

Motivation and Introduction. Related Work. Preliminary Results. Research Directions.

Page 3: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

What is Pointer Analysis?

Pointer analysis is the mechanism of statically finding out possible run-time values of a pointer.

Page 4: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

What is Pointer Analysis?

Pointer analysis is the mechanism of statically finding out possible run-time values of a pointer

andrelation of various pointers with each other.

Page 5: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Relation between pointers. p = arr + ii;

q = arr + jj;

if (p == q) {

fun();

} q = p;

...

if (p == q) {

fun();

}

Page 6: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Variants of Pointer Analysis.

Alias analysis.

do p and q point to the same memory location?

Points-to analysis.

does p point to memory location x?

Page 7: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Why Pointer Analysis? for parallelization:

fun(p);

fun(q);

for common subexpression elimination:

x = p + 2;

y = q + 2;

for dead code elimination.

if (p == q) {

fun();

}

for other optimizations.

Page 8: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Introduction. Flow sensitivity. Context sensitivity. Field sensitivity. Unification based. Inclusion based.

Page 9: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Flow sensitivity. p = &x;

p = &y;

label:

...

flow-sensitive: {(p, &y)}.

flow-insensitive: {(p, &x), (p, &y)}.

Page 10: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Context sensitivity.caller1() { caller2() { fun(int *ptr) {

fun(p); fun(q); r = ptr;

} } }

context-insensitive: {(r, p), (r, q)}.

context sensitive: {(r, p)} along call-path caller1,

{(r, q)} along call-path caller2.

Page 11: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Field sensitivity.x.f = p;

or

p = x.f;

field-sensitive: {(x.f, p)}.

field-insensitive: {(x, p)}.

Page 12: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Unification based.one(&s1); one(struct s*p) { two(struct s*q) {

one(&s2); p->a = 3; q->b = 4;

two(&s3); two(p); }

}

unification-based: {(p, &s1), (p, &s2), (p, &s3),

(q, &s1), (q, &s2), (q, &s3)}.

Page 13: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Inclusion based.one(&s1); one(struct s*p) { two(struct s*q) {

one(&s2); p->a = 3; q->b = 4;

two(&s3); two(p); }

}

inclusion-based: {(p, &s1), (p, &s2),

(q, &s1), (q, &s2), (q, &s3)}

Page 14: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Like all other important problems in Computer Science...

Alias analysis without memory allocation, intra-procedural, flow-sensitive, supporting arbitrary levels of indirection, is NP-hard.

For two levels of indirection, it is still NP-hard. Even flow-insensitive analysis is NP-hard (for arbitrary

levels of indirection). With dynamic memory allocation, allowing structs, it

becomes undecidable. Even for scalars (no structs), it remains undecidable.G Ramalingam, The undecidability of aliasing, TOPLAS 1994.

Venkatesan Chakaravarthy, New results on the computability and complexity of points-to analysis, POPL 2003.

Page 15: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

But the good news is... For single pointer dereference, even a flow-sensitive

analysis with only scalars and well-defined types is in P, if dynamic memory allocation is not allowed.

For arbitrary number of dereferences, if the analysis is flow-insensitive, it is in P.

G Ramalingam, The undecidability of aliasing, TOPLAS 1994.

Venkatesan Chakaravarthy, New results on the computability and complexity of points-to analysis, POPL 2003.

Page 16: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Open Problems. When dynamic memory allocation is not allowed, but

arbitrary number of levels of dereferencing is allowed, the problem is NP-hard. Is it in NP?

Is the above problem for bounded number of dereferences in P?

When dynamic memory is allowed, is the problem decidable?

Page 17: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Related Work. Choi et al, POPL 1993.

flow sensitive. solution set for each program point. alias sets for each CFG node. uses worklists for efficiency. precise but inefficient.

J D Choi,M Burke, P Carini, Efficient flow-sensitive interprocedural computation of pointer induced aliases and side effects,

POPL 1993.

Page 18: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Related Work.

Andersen, PhD Thesis, 1994. flow insensitive. context insensitive. inclusion based. each variable represented using separate node. precision used as upper bound.

Lars Ole Andersen, Program Analysis and Specialization for the C Programming Language, PhD thesis, 1994.

Page 19: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Related Work.

Burke et al, LCPC 1995. flow insensitive. alias solution for each procedure. worklist used for efficiency. can filter alias information based on scoping. nearly as precise as Andersen's.

M Burke, P Carini, J D Choi, M Hind, Flow-insensitive interprocedural alias analysis in the presence of function pointers,

LCPC 1995.

Page 20: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Related Work.

Reps et al, POPL 1995. problem formulated using graph reachability. poly-time algorithm for interprocedural finite

distributive subset-based problems. graph reachability used for aliasing.

Thomas Reps, Susan Horwitz, Mooly Sagiv, Precise Interprocedural Dataflow Analysis via Graph Reachability, POPL 1995.

Page 21: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Related Work.

Steensgaard, POPL 1996. flow insensitive. context insensitive. field insensitive. unification based. linear space and almost linear time algorithm. imprecise but sets lower bound on time complexity.

Bjarne Steensgaard, Points-to Analysis in Almost Linear Time, POPL 1996.

Page 22: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Related Work.

Ghiya et al, PLDI 1996. flow sensitive. context sensitive. field insensitive. makes use of direction, interference and shape. classifies as tree, dag or cyclic graph.

Rakesh Ghiya, Laurie Hendren, Is it a Tree, a DAG, or a Cyclic Graph? A Shape Analysis For Heap Directed Pointers in C,

PLDI 1996.

Page 23: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Related Work.

Cheng et al, PLDI 2000. uses access paths. flow insensitive. field sensitive. cost effective context sensitivity. works well for large number of indirect function

calls.

Ben-Chung Cheng, Wen-Mei Hwu, Modular Interprocedural Pointer Analysis using Access Paths: Design, Implementation,

and Evaluation, PLDI 2000.

Page 24: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Related Work.

Whaley et al, PLDI 2004. context sensitive. field sensitive. partially flow sensitive. inclusion based. scalable (10 min, 400 MB, 8000 methods). ordered BDDs.

John Whaley, Monica Lam, Cloning-based Context-sensitive Pointer Alias Analysis Using Binary Decision Diagrams, PLDI

2004.

Page 25: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Related Work.

Lattner et al, PLDI 2007. context sensitive. flow insensitive. field sensitive. unification based. scalable. efficient (3 sec for 200K lines). low storage requirement (30MB).Chris Lattner, Andrew Lenharth, Vikram Adve, Making Context Sensitive Points-to Analysis with Heap Cloning Practical For

The Real World, PLDI 2007.

Page 26: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Our Experiments.

framework = LLVM. algorithm = Andersen. benchmark = SPEC 2000.

Page 27: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Our Experiments.benchmark no may must

perlbmk 24.5 73.9 0.2gap 25.9 72.7 0.1mcf 26.9 71.6 0.1parser 31.4 67.2 0.2twolf 34.7 63.8 0.5gcc 35.5 62.6 0.6vpr 38.5 59.9 0.3mesa 42.2 55.6 0.8ammp 51.4 46.1 1.1vortex 57.9 40.3 0.4art 74 25 0crafty 83.2 15.5 0.2bzip2 87 12 0equake 88 11 0gzip 88.6 9.7 0.5

average 52.7 45.8 0.3

Page 28: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Our Experiments.benchmark alias similarity (%) dynamic dereference sizeequake 100 1.3art 99 1.4crafty 95.1 1.6mesa 77 1.3vortex 44.5 3gcc 7.2 1.2bzip2 3.9 2.3vpr 2 2.4gzip 1.6 2.7twolf 0.1 4.1ammp 0.1 41.1gap 0 1mcf 0 3

average 33.1 5.1

Page 29: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Research Directions.

Pointer arithmetic.

void f(struct list *p, struct list *q) {

struct list *tmp;

tmp = p->next;

p->next = q->next;

q->next = q->next->next;

p->next->next = tmp;

}

Page 30: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Research Directions.

Profiling. at specific program points like function entry, exit. for hot functions. for fat pointers.

Page 31: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Research Directions.

Complex data structures. a recursive data structure is merged into a single node. some programs have a single global data structure to

operate on, like symbol table, dictionary. how to characterize complexity of a data structure?

Page 32: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Pointer Analysis.

Rupesh Nasre.

Advisor: Prof R Govindarajan.

Apr 05, 2008.

Page 33: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

188.ammp Description.Benchmark Program General Category:Computational Chemistry. Modeling large systems of molecules usuallyassociated with Biology.

Benchmark Description:The benchmark runs molecular dynamics (i.e. solves the ODE definedby Newton's equations for the motions of the atoms in the system) on aprotein-inhibitor complex which is embedded in water (see Harrison 1993 fordescriptions of the algorithm and stability analysis on it). The energy isapproximated by a classical potential or "force field". The protein isHIV protease complexed with the inhibitor indinavir. There are 9582atoms in the water and protein making this representative of a typicallarge simulation. This benchmark is derived from published work onunderstanding drug resistance in HIV (Weber and Harrison 1999).

Input Description: The problem tracks how the atoms move from an initialcoorinates and initial velocities.

Page 34: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Conferences.

POPL: Principles of Programming Languages.

PLDI: Programming Language Design and Implementation.

MSP: Memory Systems Performance.

LCPC: Languages and Compilers for Parallel Computing.

Page 35: Pointer Analysis. Rupesh Nasre. Advisor: Prof R Govindarajan. Apr 05, 2008

Related Work.

Raman et al, MSP 2005. uses executable instructions. run time (dynamic). collects RDS profile. no type information. interesting properties of data structures are found

out.