Noam Rinetzky Lecture 13: Numerical , Pointer & Shape Domains

Preview:

DESCRIPTION

Program Analysis and Verification 0368- 4479 http://www.cs.tau.ac.il/~maon/teaching/2013-2014/paav/paav1314b.html. Noam Rinetzky Lecture 13: Numerical , Pointer & Shape Domains. Slides credit: Roman Manevich , Mooly Sagiv , Eran Yahav , Ganesan Ramalingam. - PowerPoint PPT Presentation

Citation preview

Program Analysis and Verification

0368-4479http://www.cs.tau.ac.il/~maon/teaching/2013-2014/paav/paav1314b.html

Noam Rinetzky

Lecture 13: Numerical, Pointer & Shape Domains

Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav, Ganesan Ramalingam

2

Abstract Interpretation [Cousot’77]

• Mathematical foundation of static analysis– Abstract domains• Abstract states • Join ()

– Transformer functions• Abstract steps

– Chaotic iteration • Abstract computation • Structured Programs

Lattices(D, , , , , )

Monotonic functions

Fixpoints

3

Abstract (conservative) interpretation

set of states set of statesoperational semantics(concrete semantics)

statement Sset of states

abstract representation abstract semantics

statement S abstract representation

concretization concretization

4

The collecting lattice

• Lattice for a given control-flow node v: Lv=(2State, , , , , State)

• Lattice for entire control-flow graph with nodes V:

LCFG = Map(V, Lv)• We will use this lattice as a baseline for static

analysis and define abstractions of its elements

5

Equational definition of the semantics• R[2] = R[entry] x:=x-1 R[3]• R[3] = R[2] {s | s(x) > 0}• R[exit] = R[2] {s | s(x) 0}• A system of recursive equations

• Solution: concrete meaning of the program if x > 0

x := x - 1

entry

exit

R[entry]

R[2]

R[3]R[exit]

6

An abstract semantics

• R[2] = R[entry] x:=x-1# R[3]• R[3] = R[2] {s | s(x) > 0}#

• R[exit] = R[2] {s | s(x) 0}#

• A system of recursive equations

• Solution: abstract meaning of the program– Over-approximation of the concrete

semantics

if x > 0

x := x - 1

entry

exit

R[entry]

R[2]

R[3]R[exit]

Abstract transformer for x:=x-1

Abstract representationof {s | s(x) < 0}

7

Galois Connection: c ((c))

1

c2(c)

3((c))

The most precise (least) element in A representing c

C A

8

Monotone functions

• Let L1=(D1, ) and L2=(D2, ) be two posets

• A function f : D1 D2 is monotone if for every pair x, y D1

x y implies f(x) f(y)• A special case: L1=L2=(D, )

f : D D

9

Fixed-points• L = (D, , , , , )• f : D D monotone• Fix(f) = { d | f(d) = d }• Red(f) = { d | f(d) d }• Ext(f) = { d | d f(d) }• Theorem [Tarski 1955]– lfp(f) = Fix(f) = Red(f) Fix(f)– gfp(f) = Fix(f) = Ext(f) Fix(f)

Red(f)

Ext(f)

Fix(f)

lfp

gfp

fn()

fn()

1. A solution always exist 2. It unique 3. Not always computable

10

Continuity and ACC condition

• Let L = (D, , , ) be a complete partial order– Every ascending chain has an upper bound

• A function f is continuous if for every increasing chain Y D*,

f(Y) = { f(y) | yY }• L satisfies the ascending chain condition (ACC)

if every ascending chain eventually stabilizes:d0 d1 … dn = dn+1 = …

11

Fixed-point theorem [Kleene]

• Let L = (D, , , ) be a complete partial order and a continuous function f: D D then

lfp(f) = nN fn()

• Lemma: Monotone functions on posets satisfying ACC are continuous

12

Resulting algorithm • Kleene’s fixed point theorem

gives a constructive method for computing the lfp

lfpfn()

f()f2()

…d := while f(d) d do d := d f(d)return d

Algorithm

lfp(f) = nN fn()Mathematical definition

13

Sound abstract transformer

• Given two latticesC = (DC, C, C, C, C, C)A = (DA, A, A, A, A, A)and GCC,A=(C, , , A) with

• A concrete transformer f : DC DC

an abstract transformer f# : DA DA

• We say that f# is a sound transformer (w.r.t. f) if• c: (f(c)) f#((c))• a: (f((a))) A f#(a)

14

Soundness

1. Given two complete latticesC = (DC, C, C, C, C, C)A = (DA, A, A, A, A, A)and GCC,A=(C, , , A) with

2. Monotone concrete transformer f : DC DC

3. Monotone abstract transformer f# : DA DA s4. Either a DA : f( (a)) (f#(a))

or c DC : (f(c)) f#( (c))

Then lfp(f) (lfp(f#)) and (lfp(f)) lfp(f#)

15

Best (induced) transformer [CC’77]

C A

23f

f#(a)= (f((a)))

1 f#

4

Problem: incomputable directly

16

Fixed-point theorem [Kleene]

• Let L = (D, , , ) be a complete partial order and a continuous function f: D D then

lfp(f) = nN fn()

• Lemma: Monotone functions on posets satisfying ACC are continuous

• What if ACC does not hold?

17

Intervals lattice for variable x

[0,0][-1,-1][-2,-2] [1,1] [2,2] ......

[-,+]

[0,1] [1,2] [2,3][-1,0][-2,-1]

[-10,10]

[1,+][-,0]

... ...

[2,+][0,+][-,-1][-,-1]... ...

[-20,10]

18

Intervals lattice for variable x

• Dint[x] = { (L,H) | L-,Z and HZ,+ and LH}• • =[-,+]• = ?– [1,2] [3,4] ?– [1,4] [1,3] ?– [1,3] [1,4] ?– [1,3] [-,+] ?

• What is the lattice height?

19

Joining/meeting intervals

• [a,b] [c,d] = [min(a,c), max(b,d)]– [1,1] [2,2] = [1,2]– [1,1] [2,+] = [1,+]

• [a,b] [c,d] = [max(a,c), min(b,d)] if a proper interval and otherwise – [1,2] [3,4] = – [1,4] [3,4] = [3,4]– [1,1] [1,+] = [1,1]

• Check that indeed xy if and only if xy=y

20

Interval domain for programs

• Dint[x] = { (L,H) | L-,Z and HZ,+ and LH}• For a program with variables Var={x1,…,xk}

• Dint[Var] = Dint[x1] … Dint[xk]• How can we represent it in terms of formulas?– Two types of factoids xc and xc– Example: S = {x9, y5, y10}– Helper operations

• c + + = +• remove(S, x) = S without any x-constraints• lb(S, x) = k if k ≤ x ≤ m• ub(S,x) = m if k ≤ x ≤ m

21

Assignment transformers• x := c# S = remove(S,x) {xc, xc}• x := y# S = remove(S,x) {xlb(S,y), xub(S,y)}• x := y+c# S = remove(S,x) {xlb(S,y)+c, xub(S,y)+c}• x := y+z# S = remove(S,x) {xlb(S,y)+lb(S,z),

xub(S,y)+ub(S,z)}• x := y*c# S = remove(S,x) if c>0 {xlb(S,y)*c, xub(S,y)*c}

else {xub(S,y)*-c, xlb(S,y)*-c}

• x := y*z# S = remove(S,x) ?

22

assume transformers• assume x=c# S = S {xc, xc}• assume x<c# S = S {xc-1}• assume x=y# S = S {xlb(S,y), xub(S,y)}• assume xc# S = (S {xc-1}) (S {xc+1})

23

Too many iterations to converge

24

Analysis with widening

A

f#2 f#3

f#2 f#3

f#

Red(f)

Fix(f) lpf(f#)

25

Analysis with narrowing

A

f#2 f#3

f#2 f#3

f#

Red(f)

Fix(f) lpf(f#)

26

Overview• Goal: infer numeric properties of program variables

(integers, floating point)• Applications

– Detect division by zero, overflow, out-of-bound array access– Help non-numerical domains

• Classification– Non-relational– (Weakly-)relational– Equalities / Inequalities– Linear / non-linear– Exotic

27

Implementation

28

Non-relational abstractions

• Abstract each variable individually– Constant propagation [Kildall’73]– Sign– Parity (congruences)– Intervals (Box)

29

Sign abstraction for variable x

• Concrete lattice: C = (2State, , , , , State) • Sign = {, neg, 0, pos, }• GCC,Sign=(C, , , Sign)• () = ?• (neg) = ?• (0) = ?• (pos) = ?• () = ?• How can we represent 0?

neg pos

0

30

Transformer x:=y+z

pos 0 neg neg neg neg

pos 0 neg 0

pos pos pos

31

Parity abstraction for variable x• Concrete lattice: C = (2State, , , , , State) • Parity = {, E, O, }• GCC,Parity=(C, , , Parity)• () = ?• (E) = ?• (O) = ?• () = ?

E O

32

Transformer x:=y+z

O E O E E

E O O

33

Boxes (intervals)

0 2 312345

4

6

x

y

1

y [3,6]

x [1,4]

34

Non-relational abstractions

• Cannot prove properties that hold simultaneous for several variables– x = 2*y– x ≤ y

35

Zone abstraction [Mine]• Maintain bounded differences between a pair of

program variables (useful for tracking array accesses)• Abstract state is a conjunction of linear inequalities of

the form x-yc

0 2 312345

4

6

x

y

1

x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1

36

Difference bound matrices• Add a special V0 variable for the number 0• Represent non-existent relations between variables by +

entries• Convenient for defining the partial order between two abstract

elements… =?

x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1

y x V0

3 4 + V0

+ + -1 x

+ 1 -1 y

37

Difference bound matrices• Add a special V0 variable for the number 0• Represent non-existent relations between variables by +

entries• Convenient for defining the partial order between two abstract

elements… =?

x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1

y x V0

3 4 + V0

+ + -1 x

+ 1 -1 y

38

Ordering DBMs• How should we order M1 M2?

x ≤ 5−x ≤ −1y ≤ 3x − y ≤ 1

y x V0

3 5 + V0

+ + -1 x

+ 1 + y

x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1

y x V0

3 4 + V0

+ + -1 x

+ 1 -1 y

M1 =

M2 =

39

Widening DBMs• How should we join M1 M2?

x ≤ 2−x ≤ −1y ≤ 0x − y ≤ 1

y x V0

0 2 + V0

+ + -1 x

+ 1 + y

x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1

y x V0

3 4 + V0

+ + -1 x

+ 1 -1 y

M1 =

M2 =

40

Widening DBMs• How should we widen M1 M2?

x ≤ 5−x ≤ −1y ≤ 3x − y ≤ 1

y x V0

3 5 + V0

+ + -1 x

+ 1 + y

x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1

y x V0

3 4 + V0

+ + -1 x

+ 1 -1 y

M1 =

M2 =

41

Potential graph• A vertex per variable• A directed edge with the weight of the inequality• Enables computing semantic reduction by shortest-path

algorithms

x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1

V0

x y

-1-1

1

3

3

Can we tell whether a system of constraints is satisfiable?

42

Semantic reduction for zones

• Apply the following rule repeatedlyx - y ≤ c y - z ≤ d

x - z ≤ c+d• When should we stop?• Theorem 3.3.4. Best abstraction of potential sets

and zones m = (∗ Pot ◦ Pot)(m)

• A word of caution: do not apply widening on top of semantic reduction (see 3.7.2)

43

Octagon abstraction [Mine-01]

• Abstract state is an intersection of linear inequalities of the form x y c

captures relationships common in programs (array access)

44

Some inequality-basedrelational domains

policy iteration

45

Polyhedral Abstraction

• abstract state is an intersection of linear inequalities of the form a1x2+a2x2+…anxn c

• represent a set of points by their convex hull

(image from :// . . . /~http www cs sunysb edu / / - .algorith files convex hull shtml)

46

Operations on Polyhedra

47

Equality-based domains

• Simple congruences [Granger’89]: y=a mod k• Linear relations: y=a*x+b– Join operator a little tricky

• Linear equalities [Karr’76]: a1*x1+…+ak*xk = c• Polynomial equalities:

a1*x1d1*…*xk

dk + b1*y1z1*…*yk

zk + … = c– Some good results are obtainable when

d1+…+dk <n for some small n

48

Pointer Analysis

49

Constant propagation example

x = 3;

y = 4;

z = x + 5;

50

Constant propagation example with pointers

x = 3;

*p = 4;

z = x + 5;

Is x always 3 here?

51

p = &y; x = 3; *p = 4; z = x + 5;

Constant propagation example with pointers

x is always 3

p = &x; x = 3; *p = 4; z = x + 5;

if (?) p = &x; else p = &y; x = 3; *p = 4; z = x + 5; x is always

4

x may be 3 or 4(i.e., x is unknown in our lattice)

pointers affectmost program analyses

52

p = &y; x = 3; *p = 4; z = x + 5;

Constant propagation example with pointers

p = &x; x = 3; *p = 4; z = x + 5;

if (?) p = &x; else p = &y; x = 3; *p = 4; z = x + 5;p always

points-to y

p alwayspoints-to

x

p may point-to x or y

53

Points-to Analysis

• Determine the set of targets a pointer variable could point-to (at different points in the program)– “p points-to x”• “p stores the value &x”• “*p denotes the location x”

– targets could be variables or locations in the heap (dynamic memory allocation)• p = &x;• p = new Foo(); or p = malloc (…);

– must-point-to vs. may-point-to

54

Constant propagation example with pointers

*q = 3;

*p = 4;

z = *q + 5;

what values canthis take?

Can *p denote thesame location as *q?

55

More terminology

• *p and *q are said to be aliases (in a given concrete state) if they represent the same location

• Alias analysis– Determine if a given pair of references could be

aliases at a given program point– *p may-alias *q– *p must-alias *q

56

Pointer Analysis

• Points-To Analysis– may-point-to– must-point-to

• Alias Analysis– may-alias– must-alias

57

Applications

• Compiler optimizations– Method de-virtualization– Call graph construction– Allocating objects on stack via escape analysis

• Verification & Bug Finding– Datarace detection– Use in preliminary phases– Use in verification itself

58

Points-to analysis: a simple examplep = &x;q = &y;if (?) { q = p;}x = &a;y = &b;z = *q;

{p=&x}{p=&x q=&y}

{p=&x q=&x}{p=&x (q=&y q=&x)}{p=&x (q=&y q=&x) x=&a}{p=&x (q=&y q=&x) x=&a y=&b}{p=&x (q=&yq=&x) x=&a y=&b (z=xz=y)}

How would you construct an abstract domain to represent these abstract states?

We will usually drop variable-equality information

59

Points-to lattice

• Points-to– PT-factoids[x] = { x=&y | y Var} false

PT[x] = (2PT-factoids, , , , false, PT-factoids[x])(interpreted disjunctively)

• How should combine them to get the abstract states in the example?{p=&x (q=&yq=&x) x=&a y=&b}

60

Points-to lattice

• Points-to– PT-factoids[x] = { x=&y | y Var} false

PT[x] = (2PT-factoids, , , , false, PT-factoids[x])(interpreted disjunctively)

• How should combine them to get the abstract states in the example?{p=&x (q=&yq=&x) x=&a y=&b}

• D[x] = Disj(VE[x]) Disj(PT[x])• For all program variables: D = D[x1] … D[xk]

61

Points-to analysisa = &yx = &a;y = &b;if (?) { p = &x;} else { p = &y;}

*x = &c;*p = &c;

How should we handle this statement? Strong

update

Weak updateWeak update

{x=&a y=&b (p=&xp=&y) a=&y}

{x=&a y=&b (p=&xp=&y) a=&c}{(x=&ax=&c) (y=&by=&c) (p=&xp=&y)}

62

Questions

• When is it correct to use a strong update?A weak update?

• Is this points-to analysis precise?

• What does it mean to say– p must-point-to x at program point u– p may-point-to x at program point u– p must-not-point-to x at program u– p may-not-point-to x at program u

63

Points-to analysis, formally

• We must formally define what we want to compute before we can answer many such questions

64

PWhile syntax

• A primitive statement is of the form• x := null• x := y• x := *y• x := &y;• *x := y• skip(where x and y are variables in Var)

Omitted (for now)• Dynamic memory allocation• Pointer arithmetic• Structures and fields• Procedures

65

PWhile operational semantics

• State : (VarZ) (VarVar{null})• x = y s = • x = *y s = • *x = y s = • x = null s = • x = &y s =

66

PWhile operational semantics

• State : (VarZ) (VarVar{null})• x = y s = s[xs(y)]• x = *y s = s[xs(s(y))]• *x = y s = s[s(x)s(y)]• x = null s = s[xnull]• x = &y s = s[xy]

must say what happens if null is

dereferenced

67

PWhile collecting semantics

• CS[u] = set of concrete states that can reach program point u (CFG node)

68

Ideal PT Analysis: formal definition

• Let u denote a node in the CFG

• Define IdealMustPT(u) to be { (p,x) | forall s in CS[u]. s(p) = x }

• Define IdealMayPT(u) to be{ (p,x) | exists s in CS[u]. s(p) = x }

69

May-point-to analysis:formal Requirement specification

For every vertex u in the CFG,compute a set R(u) such that

R(u) { (p,x) | $sCS[u]. s(p) = x }

Compute R: V -> 2Vars’ such thatR(u) IdealMayPT(u)

(where Var’ = Var U {null})

May Point-To Analysis

70

May-point-to analysis:formal Requirement specification

• An algorithm is said to be correct if the solution R it computes satisfies"uV. R(u) IdealMayPT(u)

• An algorithm is said to be precise if the solution R it computes satisfies"uV. R(u) = IdealMayPT(u)

• An algorithm that computes a solution R1 is said to be more precise than one that computes a solution R2 if

"uV. R1(u) R2(u)

Compute R: V -> 2Vars’ such thatR(u) IdealMayPT(u)

71

(May-point-to analysis)Algorithm A

• Is this algorithm correct?• Is this algorithm precise?

• Let’s first completely and formally define the algorithm

72

Points-to graphsx = &a;y = &b;if (?) { p = &x;} else { p = &y;}

*x = &c;*p = &c;

{x=&a y=&b (p=&xp=&y)}

{x=&a y=&b (p=&xp=&y) a=&c}{(x=&ax=&c) (y=&by=&c) (p=&xp=&y)}

axc

by

pThe points-to set of x

73

Algorithm A: A formal definitionthe “Data Flow Analysis” Recipe

• Define join-semilattice of abstract-values– PTGraph ::= (Var, VarVar’)– g1 g2 = ?– = ?– = ?

• Define transformers for primitive statements– stmt# : PTGraph PTGraph

74

Algorithm A: A formal definitionthe “Data Flow Analysis” Recipe

• Define join-semilattice of abstract-values– PTGraph ::= (Var, VarVar’)– g1 g2 = (Var, E1 E2)– = (Var, {}) – = (Var, VarVar’)

• Define transformers for primitive statements– stmt# : PTGraph PTGraph

75

Algorithm A: transformers

• Abstract transformers for primitive statements– stmt # : PTGraph PTGraph

• x := y # (Var, E) = ?• x := null # (Var, E) = ?• x := &y # (Var, E) = ?• x := *y # (Var, E) = ?• *x := &y # (Var, E) = ?

76

Algorithm A: transformers

• Abstract transformers for primitive statements– stmt # : PTGraph PTGraph

• x := y # (Var, E) = (Var, E[succ(x)=succ(y)]• x := null # (Var, E) = (Var, E[succ(x)={null}]• x := &y # (Var, E) = (Var, E[succ(x)={y}]• x := *y # (Var, E) = (Var,

E[succ(x)=succ(succ(y))] • *x := &y # (Var, E) = ???

77

Correctness & precision

• We have a complete & formal definition of the problem

• We have a complete & formal definition of a proposed solution

• How do we reason about the correctness & precision of the proposed solution?

78

Points-to analysis(abstract interpretation)

(Y) = { (p,x) | exists s in Y. s(p) = x }

CS(u)

2State PTGraph

IdealMayPT(u)

MayPT(u)

IdealMayPT (u) = ( CS(u) )

79

Concrete transformers

• CS[stmt] : State State• x = y s = s[xs(y)]• x = *y s = s[xs(s(y))]• *x = y s = s[s(x)s(y)]• x = null s = s[xnull]• x = &y s = s[xy]

• CS*[stmt] : 2State 2State

• CS*[st] X = { CS[st]s | s X }

80

Shape Analysis

81

Shape Analysis

Automatically verify properties of programs manipulating dynamically allocated storage

Identify all possible shapes (layout) of the heap

82

Sequential Stackvoid push (int v) { Node x = malloc(sizeof(Node)); x->d = v; x->n = Top; Top = x;}

int pop() { if (Top == NULL) return EMPTY;

Node *s = Top->n;

int r = Top->d;

Top = s;

return r;

{

Want to VerifyNo Null DereferenceUnderlying list remains acyclic after each operation

83

Shape Analysis via 3-valued Logic

1) Abstraction– 3-valued logical structure– canonical abstraction

2) Transformers– via logical formulae– soundness by construction • embedding theorem, [SRW02]

84

Concrete State• represent a concrete state as a two-valued logical

structure– Individuals = heap allocated objects– Unary predicates = object properties– Binary predicates = relations

• parametric vocabulary

Topn n

u1 u2 u3

(storeless, no heap addresses)

85

Concrete State• S = <U, > over a vocabulary P• U – universe• - interpretation, mapping each predicate from p to its

truth value in S

Topn n

u1 u2 u3

U = { u1, u2, u3} P = { Top, n }

(n)(u1,u2) = 1, (n)(u1,u3)=0, (n)(u2,u1)=0 …,(Top)(u1)=1, (Top)(u2)=0, (Top)(u3)=0

86

v1,v2: n(v1, v2) n*(v2, v1)

Formulae for Observing Properties

void push (int v) { Node x =

malloc(sizeof(Node));

xd = v;

xn = Top;

Top = x;

}

Topn nu1 u2 u3

1

No Cyclesv1,v2: n(v1, v2) n*(v2, v1) 1

w:Top(w)Top != null

w: x(w)

w: x(w) No node precedes Topv1,v2: n(v1, v2) Top(v2) 1

v1,v2: n(v1, v2) Top(v2)

87

Concrete Interpretation Rules

Statement Update formula

x =NULL x’(v)= 0

x= malloc() x’(v) = IsNew(v)

x=y x’(v)= y(v)

x=y next x’(v)= $w: y(w) n(w, v)

x next=y n’(v, w) = (x(v) n(v, w)) (x(v) y(w))

88

Example: s = Topn

Top

s’(v) = $v1: Top(v1) n(v1,v)

u1 u2 u3 Top

s

u1 u2 u3

Top

u1 1

u2 0

u3 0

n u1 u2 U3

u1 0 1 0

u2 0 0 1

u3 0 0 0

s

u1 0

u2 0

u3 0

Top

u1 1

u2 0

u3 0

n u1 u2 U3

u1 0 1 0

u2 0 0 1

u3 0 0 0

s

u1 0

u2 1

u3 0

89

Collecting Semantics

{ st(w)(S) | S CSS[w] } (w,v ) E(G),

w Assignments(G)

< {,} >

CSS [v] =

if v = entry

{ S | S CSS[w] } (w,v ) E(G),

w Skip(G)

{ S | S CSS[w] and S cond(w)} (w,v ) True-Branches(G)

{ S | S CSS[w] and S cond(w)}(w,v ) False-Branches(G)

othrewise

90

Collecting Semantics

• At every program point – a potentially infinite set of two-valued logical structures

• Representing (at least) all possible heaps that can arise at the program point

• Next step: find a bounded abstract representation

91

3-Valued Logic

• 1 = true• 0 = false• 1/2 = unknown

A join semi-lattice, 0 1 = 1/2

1/2

0 1

info

rmati

on o

rder

logical order

92

3-Valued Logical Structures

• A set of individuals (nodes) U• Relation meaning– Interpretation of relation symbols in P

p0() {0,1, 1/2}p1(v) {0,1, 1/2}p2(u,v) {0,1, 1/2}

• A join semi-lattice: 0 1 = 1/2

93

Boolean Connectives [Kleene] 0 1/2 10 0 0 01/2 0 1/2 1/21 0 1/2 1

0 1/2 10 0 1/2 11/2 1/2 1/2 11 1 1 1

94

Property Space

• 3-struct[P] = the set of 3-valued logical structures over a vocabulary (set of predicates) P

• Abstract domain– (3-Struct[P])– is • We will see alternatives later (maybe)

95

Embedding Order

• Given two structures S = <U, >, S’ = <U’, ’> and an onto function f : U U’ mapping individuals in U to individuals in U’

• We say that f embeds S in S’ (denoted by S S’) if– for every predicate symbol p P of arity k: u1, …, uk U, (p)

(u1, ..., uk) ’(p)(f(u1), ..., f (uk)) – and for all u’ U’

( | { u | f (u) = u’ } | > 1) ’(sm)(u’)

• We say that S can be embedded in S’ (denoted by S S’) if there exists a function f such that S f S’

96

Tight Embedding

• S’ = <U’, ’> is a tight embedding of S=< U, > with respect to a function f if:– S’ does not lose unnecessary information

’(u’1,…, u’k) = {(u1 ..., uk) | f(u1)=u’1,..., f(uk)=u’k}

• One way to get tight embedding is canonical abstraction

97

Canonical Abstraction

Topn nu1 u2 u3u1 u2 u3

[Sagiv, Reps, Wilhelm, TOPLAS02]

98

Canonical Abstraction

Topn nu1 u2 u3u1 u2 u3

Top

[Sagiv, Reps, Wilhelm, TOPLAS02]

99

Canonical Abstraction

Topn nu1 u2 u3u1 u2 u3

Top

100

Canonical Abstraction

Topn nu1 u2 u3u1 u2 u3

Top

101

Canonical Abstraction

Topn nu1 u2 u3u1 u2 u3

Top

102

Canonical Abstraction

Topn nu1 u2 u3u1 u2 u3

Top

103

Canonical Abstraction ()

• Merge all nodes with the same unary predicate values into a single summary node

• Join predicate values

’(u’1 ,..., u’k) = { (u1 ,..., uk) | f(u1)=u’1 ,..., f(uk)=u’k }

• Converts a state of arbitrary size into a 3-valued abstract state of bounded size

• (C) = { (c) | c C }

104

Information Loss

Topn n

Topn n

n

Top

Top

Top

Canonical abstraction

Topn n

105

Instrumentation Predicates• Record additional derived information via

predicatesrx(v) = $v1: x(v1) n*(v1,v)

c(v) = v1: n(v1, v) n*(v, v1)

Topn n

cTop cn n

n

Top

cTop

Canonical abstraction

106

Embedding Theorem: Conservatively Observing Properties

No Cyclesv1,v2: n(v1, v2) n*(v2, v1)1/2

Top rTop rTop

No cycles (derived)v:c(v) 1

107

Operational Semanticsvoid push (int v) { Node x = malloc(sizeof(Node)); x->d = v; x->n = Top; Top = x;}

int pop() { if (Top == NULL) return EMPTY;

Node *s = Top->n;

int r = Top->d;

Top = s;

return r;

{

s’(v) = $v1: Top(v1) n(v1,v)

s = Top->n

Topn nu1 u2 u3

Top n nu1 u2 u3

s

108

Abstract Semantics

Top

s = Topn ?rTop rTop Top rTop rTop

s

s’(v) = $v1: Top(v1) n(v1,v)

s = Top->n

109

Top

Best Transformer (s = Topn)Concrete

Semantics

Canonical

Abstraction

AbstractSemantics

?rTop rTop

...

Top rTop rTop

Top rTop rTop rTop

...

Top

s

rTop rTop

Top

s

rTop rTop rTop

Top

s

rTop rTop

Top

s

rTop rTop rTop

s’(v) = $v1: Top(v1) n(v1,v)

110

Semantic Reduction

• Improve the precision of the analysis by recovering properties of the program semantics

• A Galois connection (C, , , A)• An operation op:AA is a semantic reduction

when– "lL2 op(l)l and – (op(l)) = (l) l

C A

op

111

The Focus Operation

• Focus: Formula((3-Struct) (3-Struct))• Generalizes materialization• For every formula – Focus()(X) yields structure in which evaluates to a

definite values in all assignments– Only maximal in terms of embedding– Focus() is a semantic reduction– But Focus()(X) may be undefined for some X

112

Top rTop rTop

Top rTop rTop rTop

Partial Concretization Based on Transformer (s=Topn)

AbstractSemantics

Partial Concretization

Canonical Abstraction

AbstractSemantics

Top

s

rTop rTop

Top

s

rTop rTop rTop

Top rTop rTop rTop

s

Top

s

rTop rTop

Top rTop rTop

s’(v) = $v1: Top(v1) n(v1,v)

$u: top(u) n(u, v)

Focus (Topn)

113

Partial Concretization

• Locally refine the abstract domain per statement• Soundness is immediate• Employed in other shape analysis algorithms

[Distefano et.al., TACAS’06, Evan et.al., SAS’07, POPL’08]

114

The Coercion Principle

• Another Semantic Reduction• Can be applied after Focus or after Update or

both• Increase precision by exploiting some

structural properties possessed by all stores (Global invariants)

• Structural properties captured by constraints

• Apply a constraint solver

115

Apply Constraint Solver

Top rTop rTop

x

Top rTop rTop

x

x rx rx,ryn

rx,ry

n n

ny

n

Top rTop rTop

x rx rx,ryn

rx,ry

n

y

n

116

Sources of Constraints

• Properties of the operational semantics• Domain specific knowledge– Instrumentation predicates

• User supplied

117

Example Constraints

x(v1) x(v2)eq(v1, v2)

n(v, v1) n(v,v2)eq(v1, v2)

n(v1, v) n(v2,v)eq(v1, v2)is(v)

n*(v3, v4)t[n](v1, v2)

118

Abstract Transformers: Summary

• Kleene evaluation yields sound solution• Focus is a statement-specific partial

concretization• Coerce applies global constraints

119

Abstract Semantics

{ t_embed(coerce(st(w)3(focusF(w)(SS[w] )))) (w,v ) E(G),

w Assignments(G)

< {,} >

SS [v] =

if v = entry

{ S | S SS[w] } (w,v ) E(G),

w Skip(G)

{ t_embed(S) | S coerce(st(w)3(focusF(w)(SS[w] ))) and S 3 cond(w)} (w,v ) True-Branches(G)

{ t_embed(S) | S coerce(st(w)3(focusF(w)(SS[w] ))) and S 3 cond(w)} (w,v ) False-Branches(G)

othrewise

120

Recap

• Abstraction– canonical abstraction – recording derived information

• Transformers–partial concretization (focus)– constraint solver (coerce)– sound information extraction

121

Stack Push

void push (int v) { Node x =

alloc(sizeof(Node));

xd = v;

xn = Top;

Top = x;

}

Topx

Top

Topx

Top

Topx

emp

x

x

xTop

Topv:c(v)

v: x(v)

v: x(v)

v1,v2: n(v1, v2) Top(v2)

Recommended