121
Program Analysis and Verification 0368-4479 http://www.cs.tau.ac.il/~maon/teaching/2013-2014/paav/pa av1314b.html Noam Rinetzky Lecture 13: Numerical, Pointer & Shape Domains Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav, Ganesan Ramalingam

Noam Rinetzky Lecture 13: Numerical , Pointer & Shape Domains

  • Upload
    hedia

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

Program Analysis and Verification 0368- 4479 http://www.cs.tau.ac.il/~maon/teaching/2013-2014/paav/paav1314b.html. Noam Rinetzky Lecture 13: Numerical , Pointer & Shape Domains. Slides credit: Roman Manevich , Mooly Sagiv , Eran Yahav , Ganesan Ramalingam. - PowerPoint PPT Presentation

Citation preview

Page 1: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

Program Analysis and Verification

0368-4479http://www.cs.tau.ac.il/~maon/teaching/2013-2014/paav/paav1314b.html

Noam Rinetzky

Lecture 13: Numerical, Pointer & Shape Domains

Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav, Ganesan Ramalingam

Page 2: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

2

Abstract Interpretation [Cousot’77]

• Mathematical foundation of static analysis– Abstract domains• Abstract states • Join ()

– Transformer functions• Abstract steps

– Chaotic iteration • Abstract computation • Structured Programs

Lattices(D, , , , , )

Monotonic functions

Fixpoints

Page 3: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

3

Abstract (conservative) interpretation

set of states set of statesoperational semantics(concrete semantics)

statement Sset of states

abstract representation abstract semantics

statement S abstract representation

concretization concretization

Page 4: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

4

The collecting lattice

• Lattice for a given control-flow node v: Lv=(2State, , , , , State)

• Lattice for entire control-flow graph with nodes V:

LCFG = Map(V, Lv)• We will use this lattice as a baseline for static

analysis and define abstractions of its elements

Page 5: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

5

Equational definition of the semantics• R[2] = R[entry] x:=x-1 R[3]• R[3] = R[2] {s | s(x) > 0}• R[exit] = R[2] {s | s(x) 0}• A system of recursive equations

• Solution: concrete meaning of the program if x > 0

x := x - 1

entry

exit

R[entry]

R[2]

R[3]R[exit]

Page 6: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

6

An abstract semantics

• R[2] = R[entry] x:=x-1# R[3]• R[3] = R[2] {s | s(x) > 0}#

• R[exit] = R[2] {s | s(x) 0}#

• A system of recursive equations

• Solution: abstract meaning of the program– Over-approximation of the concrete

semantics

if x > 0

x := x - 1

entry

exit

R[entry]

R[2]

R[3]R[exit]

Abstract transformer for x:=x-1

Abstract representationof {s | s(x) < 0}

Page 7: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

7

Galois Connection: c ((c))

1

c2(c)

3((c))

The most precise (least) element in A representing c

C A

Page 8: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

8

Monotone functions

• Let L1=(D1, ) and L2=(D2, ) be two posets

• A function f : D1 D2 is monotone if for every pair x, y D1

x y implies f(x) f(y)• A special case: L1=L2=(D, )

f : D D

Page 9: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

9

Fixed-points• L = (D, , , , , )• f : D D monotone• Fix(f) = { d | f(d) = d }• Red(f) = { d | f(d) d }• Ext(f) = { d | d f(d) }• Theorem [Tarski 1955]– lfp(f) = Fix(f) = Red(f) Fix(f)– gfp(f) = Fix(f) = Ext(f) Fix(f)

Red(f)

Ext(f)

Fix(f)

lfp

gfp

fn()

fn()

1. A solution always exist 2. It unique 3. Not always computable

Page 10: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

10

Continuity and ACC condition

• Let L = (D, , , ) be a complete partial order– Every ascending chain has an upper bound

• A function f is continuous if for every increasing chain Y D*,

f(Y) = { f(y) | yY }• L satisfies the ascending chain condition (ACC)

if every ascending chain eventually stabilizes:d0 d1 … dn = dn+1 = …

Page 11: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

11

Fixed-point theorem [Kleene]

• Let L = (D, , , ) be a complete partial order and a continuous function f: D D then

lfp(f) = nN fn()

• Lemma: Monotone functions on posets satisfying ACC are continuous

Page 12: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

12

Resulting algorithm • Kleene’s fixed point theorem

gives a constructive method for computing the lfp

lfpfn()

f()f2()

…d := while f(d) d do d := d f(d)return d

Algorithm

lfp(f) = nN fn()Mathematical definition

Page 13: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

13

Sound abstract transformer

• Given two latticesC = (DC, C, C, C, C, C)A = (DA, A, A, A, A, A)and GCC,A=(C, , , A) with

• A concrete transformer f : DC DC

an abstract transformer f# : DA DA

• We say that f# is a sound transformer (w.r.t. f) if• c: (f(c)) f#((c))• a: (f((a))) A f#(a)

Page 14: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

14

Soundness

1. Given two complete latticesC = (DC, C, C, C, C, C)A = (DA, A, A, A, A, A)and GCC,A=(C, , , A) with

2. Monotone concrete transformer f : DC DC

3. Monotone abstract transformer f# : DA DA s4. Either a DA : f( (a)) (f#(a))

or c DC : (f(c)) f#( (c))

Then lfp(f) (lfp(f#)) and (lfp(f)) lfp(f#)

Page 15: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

15

Best (induced) transformer [CC’77]

C A

23f

f#(a)= (f((a)))

1 f#

4

Problem: incomputable directly

Page 16: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

16

Fixed-point theorem [Kleene]

• Let L = (D, , , ) be a complete partial order and a continuous function f: D D then

lfp(f) = nN fn()

• Lemma: Monotone functions on posets satisfying ACC are continuous

• What if ACC does not hold?

Page 17: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

17

Intervals lattice for variable x

[0,0][-1,-1][-2,-2] [1,1] [2,2] ......

[-,+]

[0,1] [1,2] [2,3][-1,0][-2,-1]

[-10,10]

[1,+][-,0]

... ...

[2,+][0,+][-,-1][-,-1]... ...

[-20,10]

Page 18: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

18

Intervals lattice for variable x

• Dint[x] = { (L,H) | L-,Z and HZ,+ and LH}• • =[-,+]• = ?– [1,2] [3,4] ?– [1,4] [1,3] ?– [1,3] [1,4] ?– [1,3] [-,+] ?

• What is the lattice height?

Page 19: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

19

Joining/meeting intervals

• [a,b] [c,d] = [min(a,c), max(b,d)]– [1,1] [2,2] = [1,2]– [1,1] [2,+] = [1,+]

• [a,b] [c,d] = [max(a,c), min(b,d)] if a proper interval and otherwise – [1,2] [3,4] = – [1,4] [3,4] = [3,4]– [1,1] [1,+] = [1,1]

• Check that indeed xy if and only if xy=y

Page 20: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

20

Interval domain for programs

• Dint[x] = { (L,H) | L-,Z and HZ,+ and LH}• For a program with variables Var={x1,…,xk}

• Dint[Var] = Dint[x1] … Dint[xk]• How can we represent it in terms of formulas?– Two types of factoids xc and xc– Example: S = {x9, y5, y10}– Helper operations

• c + + = +• remove(S, x) = S without any x-constraints• lb(S, x) = k if k ≤ x ≤ m• ub(S,x) = m if k ≤ x ≤ m

Page 21: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

21

Assignment transformers• x := c# S = remove(S,x) {xc, xc}• x := y# S = remove(S,x) {xlb(S,y), xub(S,y)}• x := y+c# S = remove(S,x) {xlb(S,y)+c, xub(S,y)+c}• x := y+z# S = remove(S,x) {xlb(S,y)+lb(S,z),

xub(S,y)+ub(S,z)}• x := y*c# S = remove(S,x) if c>0 {xlb(S,y)*c, xub(S,y)*c}

else {xub(S,y)*-c, xlb(S,y)*-c}

• x := y*z# S = remove(S,x) ?

Page 22: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

22

assume transformers• assume x=c# S = S {xc, xc}• assume x<c# S = S {xc-1}• assume x=y# S = S {xlb(S,y), xub(S,y)}• assume xc# S = (S {xc-1}) (S {xc+1})

Page 23: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

23

Too many iterations to converge

Page 24: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

24

Analysis with widening

A

f#2 f#3

f#2 f#3

f#

Red(f)

Fix(f) lpf(f#)

Page 25: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

25

Analysis with narrowing

A

f#2 f#3

f#2 f#3

f#

Red(f)

Fix(f) lpf(f#)

Page 26: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

26

Overview• Goal: infer numeric properties of program variables

(integers, floating point)• Applications

– Detect division by zero, overflow, out-of-bound array access– Help non-numerical domains

• Classification– Non-relational– (Weakly-)relational– Equalities / Inequalities– Linear / non-linear– Exotic

Page 27: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

27

Implementation

Page 28: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

28

Non-relational abstractions

• Abstract each variable individually– Constant propagation [Kildall’73]– Sign– Parity (congruences)– Intervals (Box)

Page 29: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

29

Sign abstraction for variable x

• Concrete lattice: C = (2State, , , , , State) • Sign = {, neg, 0, pos, }• GCC,Sign=(C, , , Sign)• () = ?• (neg) = ?• (0) = ?• (pos) = ?• () = ?• How can we represent 0?

neg pos

0

Page 30: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

30

Transformer x:=y+z

pos 0 neg neg neg neg

pos 0 neg 0

pos pos pos

Page 31: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

31

Parity abstraction for variable x• Concrete lattice: C = (2State, , , , , State) • Parity = {, E, O, }• GCC,Parity=(C, , , Parity)• () = ?• (E) = ?• (O) = ?• () = ?

E O

Page 32: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

32

Transformer x:=y+z

O E O E E

E O O

Page 33: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

33

Boxes (intervals)

0 2 312345

4

6

x

y

1

y [3,6]

x [1,4]

Page 34: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

34

Non-relational abstractions

• Cannot prove properties that hold simultaneous for several variables– x = 2*y– x ≤ y

Page 35: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

35

Zone abstraction [Mine]• Maintain bounded differences between a pair of

program variables (useful for tracking array accesses)• Abstract state is a conjunction of linear inequalities of

the form x-yc

0 2 312345

4

6

x

y

1

x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1

Page 36: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

36

Difference bound matrices• Add a special V0 variable for the number 0• Represent non-existent relations between variables by +

entries• Convenient for defining the partial order between two abstract

elements… =?

x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1

y x V0

3 4 + V0

+ + -1 x

+ 1 -1 y

Page 37: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

37

Difference bound matrices• Add a special V0 variable for the number 0• Represent non-existent relations between variables by +

entries• Convenient for defining the partial order between two abstract

elements… =?

x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1

y x V0

3 4 + V0

+ + -1 x

+ 1 -1 y

Page 38: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

38

Ordering DBMs• How should we order M1 M2?

x ≤ 5−x ≤ −1y ≤ 3x − y ≤ 1

y x V0

3 5 + V0

+ + -1 x

+ 1 + y

x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1

y x V0

3 4 + V0

+ + -1 x

+ 1 -1 y

M1 =

M2 =

Page 39: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

39

Widening DBMs• How should we join M1 M2?

x ≤ 2−x ≤ −1y ≤ 0x − y ≤ 1

y x V0

0 2 + V0

+ + -1 x

+ 1 + y

x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1

y x V0

3 4 + V0

+ + -1 x

+ 1 -1 y

M1 =

M2 =

Page 40: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

40

Widening DBMs• How should we widen M1 M2?

x ≤ 5−x ≤ −1y ≤ 3x − y ≤ 1

y x V0

3 5 + V0

+ + -1 x

+ 1 + y

x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1

y x V0

3 4 + V0

+ + -1 x

+ 1 -1 y

M1 =

M2 =

Page 41: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

41

Potential graph• A vertex per variable• A directed edge with the weight of the inequality• Enables computing semantic reduction by shortest-path

algorithms

x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1

V0

x y

-1-1

1

3

3

Can we tell whether a system of constraints is satisfiable?

Page 42: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

42

Semantic reduction for zones

• Apply the following rule repeatedlyx - y ≤ c y - z ≤ d

x - z ≤ c+d• When should we stop?• Theorem 3.3.4. Best abstraction of potential sets

and zones m = (∗ Pot ◦ Pot)(m)

• A word of caution: do not apply widening on top of semantic reduction (see 3.7.2)

Page 43: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

43

Octagon abstraction [Mine-01]

• Abstract state is an intersection of linear inequalities of the form x y c

captures relationships common in programs (array access)

Page 44: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

44

Some inequality-basedrelational domains

policy iteration

Page 45: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

45

Polyhedral Abstraction

• abstract state is an intersection of linear inequalities of the form a1x2+a2x2+…anxn c

• represent a set of points by their convex hull

(image from :// . . . /~http www cs sunysb edu / / - .algorith files convex hull shtml)

Page 46: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

46

Operations on Polyhedra

Page 47: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

47

Equality-based domains

• Simple congruences [Granger’89]: y=a mod k• Linear relations: y=a*x+b– Join operator a little tricky

• Linear equalities [Karr’76]: a1*x1+…+ak*xk = c• Polynomial equalities:

a1*x1d1*…*xk

dk + b1*y1z1*…*yk

zk + … = c– Some good results are obtainable when

d1+…+dk <n for some small n

Page 48: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

48

Pointer Analysis

Page 49: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

49

Constant propagation example

x = 3;

y = 4;

z = x + 5;

Page 50: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

50

Constant propagation example with pointers

x = 3;

*p = 4;

z = x + 5;

Is x always 3 here?

Page 51: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

51

p = &y; x = 3; *p = 4; z = x + 5;

Constant propagation example with pointers

x is always 3

p = &x; x = 3; *p = 4; z = x + 5;

if (?) p = &x; else p = &y; x = 3; *p = 4; z = x + 5; x is always

4

x may be 3 or 4(i.e., x is unknown in our lattice)

pointers affectmost program analyses

Page 52: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

52

p = &y; x = 3; *p = 4; z = x + 5;

Constant propagation example with pointers

p = &x; x = 3; *p = 4; z = x + 5;

if (?) p = &x; else p = &y; x = 3; *p = 4; z = x + 5;p always

points-to y

p alwayspoints-to

x

p may point-to x or y

Page 53: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

53

Points-to Analysis

• Determine the set of targets a pointer variable could point-to (at different points in the program)– “p points-to x”• “p stores the value &x”• “*p denotes the location x”

– targets could be variables or locations in the heap (dynamic memory allocation)• p = &x;• p = new Foo(); or p = malloc (…);

– must-point-to vs. may-point-to

Page 54: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

54

Constant propagation example with pointers

*q = 3;

*p = 4;

z = *q + 5;

what values canthis take?

Can *p denote thesame location as *q?

Page 55: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

55

More terminology

• *p and *q are said to be aliases (in a given concrete state) if they represent the same location

• Alias analysis– Determine if a given pair of references could be

aliases at a given program point– *p may-alias *q– *p must-alias *q

Page 56: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

56

Pointer Analysis

• Points-To Analysis– may-point-to– must-point-to

• Alias Analysis– may-alias– must-alias

Page 57: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

57

Applications

• Compiler optimizations– Method de-virtualization– Call graph construction– Allocating objects on stack via escape analysis

• Verification & Bug Finding– Datarace detection– Use in preliminary phases– Use in verification itself

Page 58: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

58

Points-to analysis: a simple examplep = &x;q = &y;if (?) { q = p;}x = &a;y = &b;z = *q;

{p=&x}{p=&x q=&y}

{p=&x q=&x}{p=&x (q=&y q=&x)}{p=&x (q=&y q=&x) x=&a}{p=&x (q=&y q=&x) x=&a y=&b}{p=&x (q=&yq=&x) x=&a y=&b (z=xz=y)}

How would you construct an abstract domain to represent these abstract states?

We will usually drop variable-equality information

Page 59: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

59

Points-to lattice

• Points-to– PT-factoids[x] = { x=&y | y Var} false

PT[x] = (2PT-factoids, , , , false, PT-factoids[x])(interpreted disjunctively)

• How should combine them to get the abstract states in the example?{p=&x (q=&yq=&x) x=&a y=&b}

Page 60: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

60

Points-to lattice

• Points-to– PT-factoids[x] = { x=&y | y Var} false

PT[x] = (2PT-factoids, , , , false, PT-factoids[x])(interpreted disjunctively)

• How should combine them to get the abstract states in the example?{p=&x (q=&yq=&x) x=&a y=&b}

• D[x] = Disj(VE[x]) Disj(PT[x])• For all program variables: D = D[x1] … D[xk]

Page 61: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

61

Points-to analysisa = &yx = &a;y = &b;if (?) { p = &x;} else { p = &y;}

*x = &c;*p = &c;

How should we handle this statement? Strong

update

Weak updateWeak update

{x=&a y=&b (p=&xp=&y) a=&y}

{x=&a y=&b (p=&xp=&y) a=&c}{(x=&ax=&c) (y=&by=&c) (p=&xp=&y)}

Page 62: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

62

Questions

• When is it correct to use a strong update?A weak update?

• Is this points-to analysis precise?

• What does it mean to say– p must-point-to x at program point u– p may-point-to x at program point u– p must-not-point-to x at program u– p may-not-point-to x at program u

Page 63: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

63

Points-to analysis, formally

• We must formally define what we want to compute before we can answer many such questions

Page 64: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

64

PWhile syntax

• A primitive statement is of the form• x := null• x := y• x := *y• x := &y;• *x := y• skip(where x and y are variables in Var)

Omitted (for now)• Dynamic memory allocation• Pointer arithmetic• Structures and fields• Procedures

Page 65: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

65

PWhile operational semantics

• State : (VarZ) (VarVar{null})• x = y s = • x = *y s = • *x = y s = • x = null s = • x = &y s =

Page 66: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

66

PWhile operational semantics

• State : (VarZ) (VarVar{null})• x = y s = s[xs(y)]• x = *y s = s[xs(s(y))]• *x = y s = s[s(x)s(y)]• x = null s = s[xnull]• x = &y s = s[xy]

must say what happens if null is

dereferenced

Page 67: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

67

PWhile collecting semantics

• CS[u] = set of concrete states that can reach program point u (CFG node)

Page 68: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

68

Ideal PT Analysis: formal definition

• Let u denote a node in the CFG

• Define IdealMustPT(u) to be { (p,x) | forall s in CS[u]. s(p) = x }

• Define IdealMayPT(u) to be{ (p,x) | exists s in CS[u]. s(p) = x }

Page 69: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

69

May-point-to analysis:formal Requirement specification

For every vertex u in the CFG,compute a set R(u) such that

R(u) { (p,x) | $sCS[u]. s(p) = x }

Compute R: V -> 2Vars’ such thatR(u) IdealMayPT(u)

(where Var’ = Var U {null})

May Point-To Analysis

Page 70: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

70

May-point-to analysis:formal Requirement specification

• An algorithm is said to be correct if the solution R it computes satisfies"uV. R(u) IdealMayPT(u)

• An algorithm is said to be precise if the solution R it computes satisfies"uV. R(u) = IdealMayPT(u)

• An algorithm that computes a solution R1 is said to be more precise than one that computes a solution R2 if

"uV. R1(u) R2(u)

Compute R: V -> 2Vars’ such thatR(u) IdealMayPT(u)

Page 71: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

71

(May-point-to analysis)Algorithm A

• Is this algorithm correct?• Is this algorithm precise?

• Let’s first completely and formally define the algorithm

Page 72: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

72

Points-to graphsx = &a;y = &b;if (?) { p = &x;} else { p = &y;}

*x = &c;*p = &c;

{x=&a y=&b (p=&xp=&y)}

{x=&a y=&b (p=&xp=&y) a=&c}{(x=&ax=&c) (y=&by=&c) (p=&xp=&y)}

axc

by

pThe points-to set of x

Page 73: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

73

Algorithm A: A formal definitionthe “Data Flow Analysis” Recipe

• Define join-semilattice of abstract-values– PTGraph ::= (Var, VarVar’)– g1 g2 = ?– = ?– = ?

• Define transformers for primitive statements– stmt# : PTGraph PTGraph

Page 74: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

74

Algorithm A: A formal definitionthe “Data Flow Analysis” Recipe

• Define join-semilattice of abstract-values– PTGraph ::= (Var, VarVar’)– g1 g2 = (Var, E1 E2)– = (Var, {}) – = (Var, VarVar’)

• Define transformers for primitive statements– stmt# : PTGraph PTGraph

Page 75: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

75

Algorithm A: transformers

• Abstract transformers for primitive statements– stmt # : PTGraph PTGraph

• x := y # (Var, E) = ?• x := null # (Var, E) = ?• x := &y # (Var, E) = ?• x := *y # (Var, E) = ?• *x := &y # (Var, E) = ?

Page 76: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

76

Algorithm A: transformers

• Abstract transformers for primitive statements– stmt # : PTGraph PTGraph

• x := y # (Var, E) = (Var, E[succ(x)=succ(y)]• x := null # (Var, E) = (Var, E[succ(x)={null}]• x := &y # (Var, E) = (Var, E[succ(x)={y}]• x := *y # (Var, E) = (Var,

E[succ(x)=succ(succ(y))] • *x := &y # (Var, E) = ???

Page 77: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

77

Correctness & precision

• We have a complete & formal definition of the problem

• We have a complete & formal definition of a proposed solution

• How do we reason about the correctness & precision of the proposed solution?

Page 78: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

78

Points-to analysis(abstract interpretation)

(Y) = { (p,x) | exists s in Y. s(p) = x }

CS(u)

2State PTGraph

IdealMayPT(u)

MayPT(u)

IdealMayPT (u) = ( CS(u) )

Page 79: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

79

Concrete transformers

• CS[stmt] : State State• x = y s = s[xs(y)]• x = *y s = s[xs(s(y))]• *x = y s = s[s(x)s(y)]• x = null s = s[xnull]• x = &y s = s[xy]

• CS*[stmt] : 2State 2State

• CS*[st] X = { CS[st]s | s X }

Page 80: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

80

Shape Analysis

Page 81: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

81

Shape Analysis

Automatically verify properties of programs manipulating dynamically allocated storage

Identify all possible shapes (layout) of the heap

Page 82: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

82

Sequential Stackvoid push (int v) { Node x = malloc(sizeof(Node)); x->d = v; x->n = Top; Top = x;}

int pop() { if (Top == NULL) return EMPTY;

Node *s = Top->n;

int r = Top->d;

Top = s;

return r;

{

Want to VerifyNo Null DereferenceUnderlying list remains acyclic after each operation

Page 83: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

83

Shape Analysis via 3-valued Logic

1) Abstraction– 3-valued logical structure– canonical abstraction

2) Transformers– via logical formulae– soundness by construction • embedding theorem, [SRW02]

Page 84: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

84

Concrete State• represent a concrete state as a two-valued logical

structure– Individuals = heap allocated objects– Unary predicates = object properties– Binary predicates = relations

• parametric vocabulary

Topn n

u1 u2 u3

(storeless, no heap addresses)

Page 85: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

85

Concrete State• S = <U, > over a vocabulary P• U – universe• - interpretation, mapping each predicate from p to its

truth value in S

Topn n

u1 u2 u3

U = { u1, u2, u3} P = { Top, n }

(n)(u1,u2) = 1, (n)(u1,u3)=0, (n)(u2,u1)=0 …,(Top)(u1)=1, (Top)(u2)=0, (Top)(u3)=0

Page 86: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

86

v1,v2: n(v1, v2) n*(v2, v1)

Formulae for Observing Properties

void push (int v) { Node x =

malloc(sizeof(Node));

xd = v;

xn = Top;

Top = x;

}

Topn nu1 u2 u3

1

No Cyclesv1,v2: n(v1, v2) n*(v2, v1) 1

w:Top(w)Top != null

w: x(w)

w: x(w) No node precedes Topv1,v2: n(v1, v2) Top(v2) 1

v1,v2: n(v1, v2) Top(v2)

Page 87: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

87

Concrete Interpretation Rules

Statement Update formula

x =NULL x’(v)= 0

x= malloc() x’(v) = IsNew(v)

x=y x’(v)= y(v)

x=y next x’(v)= $w: y(w) n(w, v)

x next=y n’(v, w) = (x(v) n(v, w)) (x(v) y(w))

Page 88: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

88

Example: s = Topn

Top

s’(v) = $v1: Top(v1) n(v1,v)

u1 u2 u3 Top

s

u1 u2 u3

Top

u1 1

u2 0

u3 0

n u1 u2 U3

u1 0 1 0

u2 0 0 1

u3 0 0 0

s

u1 0

u2 0

u3 0

Top

u1 1

u2 0

u3 0

n u1 u2 U3

u1 0 1 0

u2 0 0 1

u3 0 0 0

s

u1 0

u2 1

u3 0

Page 89: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

89

Collecting Semantics

{ st(w)(S) | S CSS[w] } (w,v ) E(G),

w Assignments(G)

< {,} >

CSS [v] =

if v = entry

{ S | S CSS[w] } (w,v ) E(G),

w Skip(G)

{ S | S CSS[w] and S cond(w)} (w,v ) True-Branches(G)

{ S | S CSS[w] and S cond(w)}(w,v ) False-Branches(G)

othrewise

Page 90: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

90

Collecting Semantics

• At every program point – a potentially infinite set of two-valued logical structures

• Representing (at least) all possible heaps that can arise at the program point

• Next step: find a bounded abstract representation

Page 91: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

91

3-Valued Logic

• 1 = true• 0 = false• 1/2 = unknown

A join semi-lattice, 0 1 = 1/2

1/2

0 1

info

rmati

on o

rder

logical order

Page 92: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

92

3-Valued Logical Structures

• A set of individuals (nodes) U• Relation meaning– Interpretation of relation symbols in P

p0() {0,1, 1/2}p1(v) {0,1, 1/2}p2(u,v) {0,1, 1/2}

• A join semi-lattice: 0 1 = 1/2

Page 93: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

93

Boolean Connectives [Kleene] 0 1/2 10 0 0 01/2 0 1/2 1/21 0 1/2 1

0 1/2 10 0 1/2 11/2 1/2 1/2 11 1 1 1

Page 94: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

94

Property Space

• 3-struct[P] = the set of 3-valued logical structures over a vocabulary (set of predicates) P

• Abstract domain– (3-Struct[P])– is • We will see alternatives later (maybe)

Page 95: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

95

Embedding Order

• Given two structures S = <U, >, S’ = <U’, ’> and an onto function f : U U’ mapping individuals in U to individuals in U’

• We say that f embeds S in S’ (denoted by S S’) if– for every predicate symbol p P of arity k: u1, …, uk U, (p)

(u1, ..., uk) ’(p)(f(u1), ..., f (uk)) – and for all u’ U’

( | { u | f (u) = u’ } | > 1) ’(sm)(u’)

• We say that S can be embedded in S’ (denoted by S S’) if there exists a function f such that S f S’

Page 96: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

96

Tight Embedding

• S’ = <U’, ’> is a tight embedding of S=< U, > with respect to a function f if:– S’ does not lose unnecessary information

’(u’1,…, u’k) = {(u1 ..., uk) | f(u1)=u’1,..., f(uk)=u’k}

• One way to get tight embedding is canonical abstraction

Page 97: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

97

Canonical Abstraction

Topn nu1 u2 u3u1 u2 u3

[Sagiv, Reps, Wilhelm, TOPLAS02]

Page 98: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

98

Canonical Abstraction

Topn nu1 u2 u3u1 u2 u3

Top

[Sagiv, Reps, Wilhelm, TOPLAS02]

Page 99: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

99

Canonical Abstraction

Topn nu1 u2 u3u1 u2 u3

Top

Page 100: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

100

Canonical Abstraction

Topn nu1 u2 u3u1 u2 u3

Top

Page 101: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

101

Canonical Abstraction

Topn nu1 u2 u3u1 u2 u3

Top

Page 102: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

102

Canonical Abstraction

Topn nu1 u2 u3u1 u2 u3

Top

Page 103: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

103

Canonical Abstraction ()

• Merge all nodes with the same unary predicate values into a single summary node

• Join predicate values

’(u’1 ,..., u’k) = { (u1 ,..., uk) | f(u1)=u’1 ,..., f(uk)=u’k }

• Converts a state of arbitrary size into a 3-valued abstract state of bounded size

• (C) = { (c) | c C }

Page 104: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

104

Information Loss

Topn n

Topn n

n

Top

Top

Top

Canonical abstraction

Topn n

Page 105: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

105

Instrumentation Predicates• Record additional derived information via

predicatesrx(v) = $v1: x(v1) n*(v1,v)

c(v) = v1: n(v1, v) n*(v, v1)

Topn n

cTop cn n

n

Top

cTop

Canonical abstraction

Page 106: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

106

Embedding Theorem: Conservatively Observing Properties

No Cyclesv1,v2: n(v1, v2) n*(v2, v1)1/2

Top rTop rTop

No cycles (derived)v:c(v) 1

Page 107: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

107

Operational Semanticsvoid push (int v) { Node x = malloc(sizeof(Node)); x->d = v; x->n = Top; Top = x;}

int pop() { if (Top == NULL) return EMPTY;

Node *s = Top->n;

int r = Top->d;

Top = s;

return r;

{

s’(v) = $v1: Top(v1) n(v1,v)

s = Top->n

Topn nu1 u2 u3

Top n nu1 u2 u3

s

Page 108: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

108

Abstract Semantics

Top

s = Topn ?rTop rTop Top rTop rTop

s

s’(v) = $v1: Top(v1) n(v1,v)

s = Top->n

Page 109: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

109

Top

Best Transformer (s = Topn)Concrete

Semantics

Canonical

Abstraction

AbstractSemantics

?rTop rTop

...

Top rTop rTop

Top rTop rTop rTop

...

Top

s

rTop rTop

Top

s

rTop rTop rTop

Top

s

rTop rTop

Top

s

rTop rTop rTop

s’(v) = $v1: Top(v1) n(v1,v)

Page 110: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

110

Semantic Reduction

• Improve the precision of the analysis by recovering properties of the program semantics

• A Galois connection (C, , , A)• An operation op:AA is a semantic reduction

when– "lL2 op(l)l and – (op(l)) = (l) l

C A

op

Page 111: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

111

The Focus Operation

• Focus: Formula((3-Struct) (3-Struct))• Generalizes materialization• For every formula – Focus()(X) yields structure in which evaluates to a

definite values in all assignments– Only maximal in terms of embedding– Focus() is a semantic reduction– But Focus()(X) may be undefined for some X

Page 112: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

112

Top rTop rTop

Top rTop rTop rTop

Partial Concretization Based on Transformer (s=Topn)

AbstractSemantics

Partial Concretization

Canonical Abstraction

AbstractSemantics

Top

s

rTop rTop

Top

s

rTop rTop rTop

Top rTop rTop rTop

s

Top

s

rTop rTop

Top rTop rTop

s’(v) = $v1: Top(v1) n(v1,v)

$u: top(u) n(u, v)

Focus (Topn)

Page 113: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

113

Partial Concretization

• Locally refine the abstract domain per statement• Soundness is immediate• Employed in other shape analysis algorithms

[Distefano et.al., TACAS’06, Evan et.al., SAS’07, POPL’08]

Page 114: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

114

The Coercion Principle

• Another Semantic Reduction• Can be applied after Focus or after Update or

both• Increase precision by exploiting some

structural properties possessed by all stores (Global invariants)

• Structural properties captured by constraints

• Apply a constraint solver

Page 115: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

115

Apply Constraint Solver

Top rTop rTop

x

Top rTop rTop

x

x rx rx,ryn

rx,ry

n n

ny

n

Top rTop rTop

x rx rx,ryn

rx,ry

n

y

n

Page 116: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

116

Sources of Constraints

• Properties of the operational semantics• Domain specific knowledge– Instrumentation predicates

• User supplied

Page 117: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

117

Example Constraints

x(v1) x(v2)eq(v1, v2)

n(v, v1) n(v,v2)eq(v1, v2)

n(v1, v) n(v2,v)eq(v1, v2)is(v)

n*(v3, v4)t[n](v1, v2)

Page 118: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

118

Abstract Transformers: Summary

• Kleene evaluation yields sound solution• Focus is a statement-specific partial

concretization• Coerce applies global constraints

Page 119: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

119

Abstract Semantics

{ t_embed(coerce(st(w)3(focusF(w)(SS[w] )))) (w,v ) E(G),

w Assignments(G)

< {,} >

SS [v] =

if v = entry

{ S | S SS[w] } (w,v ) E(G),

w Skip(G)

{ t_embed(S) | S coerce(st(w)3(focusF(w)(SS[w] ))) and S 3 cond(w)} (w,v ) True-Branches(G)

{ t_embed(S) | S coerce(st(w)3(focusF(w)(SS[w] ))) and S 3 cond(w)} (w,v ) False-Branches(G)

othrewise

Page 120: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

120

Recap

• Abstraction– canonical abstraction – recording derived information

• Transformers–partial concretization (focus)– constraint solver (coerce)– sound information extraction

Page 121: Noam Rinetzky Lecture 13: Numerical , Pointer &  Shape Domains

121

Stack Push

void push (int v) { Node x =

alloc(sizeof(Node));

xd = v;

xn = Top;

Top = x;

}

Topx

Top

Topx

Top

Topx

emp

x

x

xTop

Topv:c(v)

v: x(v)

v: x(v)

v1,v2: n(v1, v2) Top(v2)