Upload
hedia
View
47
Download
0
Embed Size (px)
DESCRIPTION
Program Analysis and Verification 0368- 4479 http://www.cs.tau.ac.il/~maon/teaching/2013-2014/paav/paav1314b.html. Noam Rinetzky Lecture 13: Numerical , Pointer & Shape Domains. Slides credit: Roman Manevich , Mooly Sagiv , Eran Yahav , Ganesan Ramalingam. - PowerPoint PPT Presentation
Citation preview
Program Analysis and Verification
0368-4479http://www.cs.tau.ac.il/~maon/teaching/2013-2014/paav/paav1314b.html
Noam Rinetzky
Lecture 13: Numerical, Pointer & Shape Domains
Slides credit: Roman Manevich, Mooly Sagiv, Eran Yahav, Ganesan Ramalingam
2
Abstract Interpretation [Cousot’77]
• Mathematical foundation of static analysis– Abstract domains• Abstract states • Join ()
– Transformer functions• Abstract steps
– Chaotic iteration • Abstract computation • Structured Programs
Lattices(D, , , , , )
Monotonic functions
Fixpoints
3
Abstract (conservative) interpretation
set of states set of statesoperational semantics(concrete semantics)
statement Sset of states
abstract representation abstract semantics
statement S abstract representation
concretization concretization
4
The collecting lattice
• Lattice for a given control-flow node v: Lv=(2State, , , , , State)
• Lattice for entire control-flow graph with nodes V:
LCFG = Map(V, Lv)• We will use this lattice as a baseline for static
analysis and define abstractions of its elements
5
Equational definition of the semantics• R[2] = R[entry] x:=x-1 R[3]• R[3] = R[2] {s | s(x) > 0}• R[exit] = R[2] {s | s(x) 0}• A system of recursive equations
• Solution: concrete meaning of the program if x > 0
x := x - 1
entry
exit
R[entry]
R[2]
R[3]R[exit]
6
An abstract semantics
• R[2] = R[entry] x:=x-1# R[3]• R[3] = R[2] {s | s(x) > 0}#
• R[exit] = R[2] {s | s(x) 0}#
• A system of recursive equations
• Solution: abstract meaning of the program– Over-approximation of the concrete
semantics
if x > 0
x := x - 1
entry
exit
R[entry]
R[2]
R[3]R[exit]
Abstract transformer for x:=x-1
Abstract representationof {s | s(x) < 0}
7
Galois Connection: c ((c))
1
c2(c)
3((c))
The most precise (least) element in A representing c
C A
8
Monotone functions
• Let L1=(D1, ) and L2=(D2, ) be two posets
• A function f : D1 D2 is monotone if for every pair x, y D1
x y implies f(x) f(y)• A special case: L1=L2=(D, )
f : D D
9
Fixed-points• L = (D, , , , , )• f : D D monotone• Fix(f) = { d | f(d) = d }• Red(f) = { d | f(d) d }• Ext(f) = { d | d f(d) }• Theorem [Tarski 1955]– lfp(f) = Fix(f) = Red(f) Fix(f)– gfp(f) = Fix(f) = Ext(f) Fix(f)
Red(f)
Ext(f)
Fix(f)
lfp
gfp
fn()
fn()
1. A solution always exist 2. It unique 3. Not always computable
10
Continuity and ACC condition
• Let L = (D, , , ) be a complete partial order– Every ascending chain has an upper bound
• A function f is continuous if for every increasing chain Y D*,
f(Y) = { f(y) | yY }• L satisfies the ascending chain condition (ACC)
if every ascending chain eventually stabilizes:d0 d1 … dn = dn+1 = …
11
Fixed-point theorem [Kleene]
• Let L = (D, , , ) be a complete partial order and a continuous function f: D D then
lfp(f) = nN fn()
• Lemma: Monotone functions on posets satisfying ACC are continuous
12
Resulting algorithm • Kleene’s fixed point theorem
gives a constructive method for computing the lfp
lfpfn()
f()f2()
…d := while f(d) d do d := d f(d)return d
Algorithm
lfp(f) = nN fn()Mathematical definition
13
Sound abstract transformer
• Given two latticesC = (DC, C, C, C, C, C)A = (DA, A, A, A, A, A)and GCC,A=(C, , , A) with
• A concrete transformer f : DC DC
an abstract transformer f# : DA DA
• We say that f# is a sound transformer (w.r.t. f) if• c: (f(c)) f#((c))• a: (f((a))) A f#(a)
14
Soundness
1. Given two complete latticesC = (DC, C, C, C, C, C)A = (DA, A, A, A, A, A)and GCC,A=(C, , , A) with
2. Monotone concrete transformer f : DC DC
3. Monotone abstract transformer f# : DA DA s4. Either a DA : f( (a)) (f#(a))
or c DC : (f(c)) f#( (c))
Then lfp(f) (lfp(f#)) and (lfp(f)) lfp(f#)
15
Best (induced) transformer [CC’77]
C A
23f
f#(a)= (f((a)))
1 f#
4
Problem: incomputable directly
16
Fixed-point theorem [Kleene]
• Let L = (D, , , ) be a complete partial order and a continuous function f: D D then
lfp(f) = nN fn()
• Lemma: Monotone functions on posets satisfying ACC are continuous
• What if ACC does not hold?
17
Intervals lattice for variable x
[0,0][-1,-1][-2,-2] [1,1] [2,2] ......
[-,+]
[0,1] [1,2] [2,3][-1,0][-2,-1]
[-10,10]
[1,+][-,0]
... ...
[2,+][0,+][-,-1][-,-1]... ...
[-20,10]
18
Intervals lattice for variable x
• Dint[x] = { (L,H) | L-,Z and HZ,+ and LH}• • =[-,+]• = ?– [1,2] [3,4] ?– [1,4] [1,3] ?– [1,3] [1,4] ?– [1,3] [-,+] ?
• What is the lattice height?
19
Joining/meeting intervals
• [a,b] [c,d] = [min(a,c), max(b,d)]– [1,1] [2,2] = [1,2]– [1,1] [2,+] = [1,+]
• [a,b] [c,d] = [max(a,c), min(b,d)] if a proper interval and otherwise – [1,2] [3,4] = – [1,4] [3,4] = [3,4]– [1,1] [1,+] = [1,1]
• Check that indeed xy if and only if xy=y
20
Interval domain for programs
• Dint[x] = { (L,H) | L-,Z and HZ,+ and LH}• For a program with variables Var={x1,…,xk}
• Dint[Var] = Dint[x1] … Dint[xk]• How can we represent it in terms of formulas?– Two types of factoids xc and xc– Example: S = {x9, y5, y10}– Helper operations
• c + + = +• remove(S, x) = S without any x-constraints• lb(S, x) = k if k ≤ x ≤ m• ub(S,x) = m if k ≤ x ≤ m
21
Assignment transformers• x := c# S = remove(S,x) {xc, xc}• x := y# S = remove(S,x) {xlb(S,y), xub(S,y)}• x := y+c# S = remove(S,x) {xlb(S,y)+c, xub(S,y)+c}• x := y+z# S = remove(S,x) {xlb(S,y)+lb(S,z),
xub(S,y)+ub(S,z)}• x := y*c# S = remove(S,x) if c>0 {xlb(S,y)*c, xub(S,y)*c}
else {xub(S,y)*-c, xlb(S,y)*-c}
• x := y*z# S = remove(S,x) ?
22
assume transformers• assume x=c# S = S {xc, xc}• assume x<c# S = S {xc-1}• assume x=y# S = S {xlb(S,y), xub(S,y)}• assume xc# S = (S {xc-1}) (S {xc+1})
23
Too many iterations to converge
24
Analysis with widening
A
f#2 f#3
f#2 f#3
f#
Red(f)
Fix(f) lpf(f#)
25
Analysis with narrowing
A
f#2 f#3
f#2 f#3
f#
Red(f)
Fix(f) lpf(f#)
26
Overview• Goal: infer numeric properties of program variables
(integers, floating point)• Applications
– Detect division by zero, overflow, out-of-bound array access– Help non-numerical domains
• Classification– Non-relational– (Weakly-)relational– Equalities / Inequalities– Linear / non-linear– Exotic
27
Implementation
28
Non-relational abstractions
• Abstract each variable individually– Constant propagation [Kildall’73]– Sign– Parity (congruences)– Intervals (Box)
29
Sign abstraction for variable x
• Concrete lattice: C = (2State, , , , , State) • Sign = {, neg, 0, pos, }• GCC,Sign=(C, , , Sign)• () = ?• (neg) = ?• (0) = ?• (pos) = ?• () = ?• How can we represent 0?
neg pos
0
30
Transformer x:=y+z
pos 0 neg neg neg neg
pos 0 neg 0
pos pos pos
31
Parity abstraction for variable x• Concrete lattice: C = (2State, , , , , State) • Parity = {, E, O, }• GCC,Parity=(C, , , Parity)• () = ?• (E) = ?• (O) = ?• () = ?
E O
32
Transformer x:=y+z
O E O E E
E O O
33
Boxes (intervals)
0 2 312345
4
6
x
y
1
y [3,6]
x [1,4]
34
Non-relational abstractions
• Cannot prove properties that hold simultaneous for several variables– x = 2*y– x ≤ y
35
Zone abstraction [Mine]• Maintain bounded differences between a pair of
program variables (useful for tracking array accesses)• Abstract state is a conjunction of linear inequalities of
the form x-yc
0 2 312345
4
6
x
y
1
x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1
36
Difference bound matrices• Add a special V0 variable for the number 0• Represent non-existent relations between variables by +
entries• Convenient for defining the partial order between two abstract
elements… =?
x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1
y x V0
3 4 + V0
+ + -1 x
+ 1 -1 y
37
Difference bound matrices• Add a special V0 variable for the number 0• Represent non-existent relations between variables by +
entries• Convenient for defining the partial order between two abstract
elements… =?
x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1
y x V0
3 4 + V0
+ + -1 x
+ 1 -1 y
38
Ordering DBMs• How should we order M1 M2?
x ≤ 5−x ≤ −1y ≤ 3x − y ≤ 1
y x V0
3 5 + V0
+ + -1 x
+ 1 + y
x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1
y x V0
3 4 + V0
+ + -1 x
+ 1 -1 y
M1 =
M2 =
39
Widening DBMs• How should we join M1 M2?
x ≤ 2−x ≤ −1y ≤ 0x − y ≤ 1
y x V0
0 2 + V0
+ + -1 x
+ 1 + y
x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1
y x V0
3 4 + V0
+ + -1 x
+ 1 -1 y
M1 =
M2 =
40
Widening DBMs• How should we widen M1 M2?
x ≤ 5−x ≤ −1y ≤ 3x − y ≤ 1
y x V0
3 5 + V0
+ + -1 x
+ 1 + y
x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1
y x V0
3 4 + V0
+ + -1 x
+ 1 -1 y
M1 =
M2 =
41
Potential graph• A vertex per variable• A directed edge with the weight of the inequality• Enables computing semantic reduction by shortest-path
algorithms
x ≤ 4−x ≤ −1y ≤ 3−y ≤ −1x − y ≤ 1
V0
x y
-1-1
1
3
3
Can we tell whether a system of constraints is satisfiable?
42
Semantic reduction for zones
• Apply the following rule repeatedlyx - y ≤ c y - z ≤ d
x - z ≤ c+d• When should we stop?• Theorem 3.3.4. Best abstraction of potential sets
and zones m = (∗ Pot ◦ Pot)(m)
• A word of caution: do not apply widening on top of semantic reduction (see 3.7.2)
43
Octagon abstraction [Mine-01]
• Abstract state is an intersection of linear inequalities of the form x y c
captures relationships common in programs (array access)
44
Some inequality-basedrelational domains
policy iteration
45
Polyhedral Abstraction
• abstract state is an intersection of linear inequalities of the form a1x2+a2x2+…anxn c
• represent a set of points by their convex hull
(image from :// . . . /~http www cs sunysb edu / / - .algorith files convex hull shtml)
46
Operations on Polyhedra
47
Equality-based domains
• Simple congruences [Granger’89]: y=a mod k• Linear relations: y=a*x+b– Join operator a little tricky
• Linear equalities [Karr’76]: a1*x1+…+ak*xk = c• Polynomial equalities:
a1*x1d1*…*xk
dk + b1*y1z1*…*yk
zk + … = c– Some good results are obtainable when
d1+…+dk <n for some small n
48
Pointer Analysis
49
Constant propagation example
x = 3;
y = 4;
z = x + 5;
50
Constant propagation example with pointers
x = 3;
*p = 4;
z = x + 5;
Is x always 3 here?
51
p = &y; x = 3; *p = 4; z = x + 5;
Constant propagation example with pointers
x is always 3
p = &x; x = 3; *p = 4; z = x + 5;
if (?) p = &x; else p = &y; x = 3; *p = 4; z = x + 5; x is always
4
x may be 3 or 4(i.e., x is unknown in our lattice)
pointers affectmost program analyses
52
p = &y; x = 3; *p = 4; z = x + 5;
Constant propagation example with pointers
p = &x; x = 3; *p = 4; z = x + 5;
if (?) p = &x; else p = &y; x = 3; *p = 4; z = x + 5;p always
points-to y
p alwayspoints-to
x
p may point-to x or y
53
Points-to Analysis
• Determine the set of targets a pointer variable could point-to (at different points in the program)– “p points-to x”• “p stores the value &x”• “*p denotes the location x”
– targets could be variables or locations in the heap (dynamic memory allocation)• p = &x;• p = new Foo(); or p = malloc (…);
– must-point-to vs. may-point-to
54
Constant propagation example with pointers
*q = 3;
*p = 4;
z = *q + 5;
what values canthis take?
Can *p denote thesame location as *q?
55
More terminology
• *p and *q are said to be aliases (in a given concrete state) if they represent the same location
• Alias analysis– Determine if a given pair of references could be
aliases at a given program point– *p may-alias *q– *p must-alias *q
56
Pointer Analysis
• Points-To Analysis– may-point-to– must-point-to
• Alias Analysis– may-alias– must-alias
57
Applications
• Compiler optimizations– Method de-virtualization– Call graph construction– Allocating objects on stack via escape analysis
• Verification & Bug Finding– Datarace detection– Use in preliminary phases– Use in verification itself
58
Points-to analysis: a simple examplep = &x;q = &y;if (?) { q = p;}x = &a;y = &b;z = *q;
{p=&x}{p=&x q=&y}
{p=&x q=&x}{p=&x (q=&y q=&x)}{p=&x (q=&y q=&x) x=&a}{p=&x (q=&y q=&x) x=&a y=&b}{p=&x (q=&yq=&x) x=&a y=&b (z=xz=y)}
How would you construct an abstract domain to represent these abstract states?
We will usually drop variable-equality information
59
Points-to lattice
• Points-to– PT-factoids[x] = { x=&y | y Var} false
PT[x] = (2PT-factoids, , , , false, PT-factoids[x])(interpreted disjunctively)
• How should combine them to get the abstract states in the example?{p=&x (q=&yq=&x) x=&a y=&b}
60
Points-to lattice
• Points-to– PT-factoids[x] = { x=&y | y Var} false
PT[x] = (2PT-factoids, , , , false, PT-factoids[x])(interpreted disjunctively)
• How should combine them to get the abstract states in the example?{p=&x (q=&yq=&x) x=&a y=&b}
• D[x] = Disj(VE[x]) Disj(PT[x])• For all program variables: D = D[x1] … D[xk]
61
Points-to analysisa = &yx = &a;y = &b;if (?) { p = &x;} else { p = &y;}
*x = &c;*p = &c;
How should we handle this statement? Strong
update
Weak updateWeak update
{x=&a y=&b (p=&xp=&y) a=&y}
{x=&a y=&b (p=&xp=&y) a=&c}{(x=&ax=&c) (y=&by=&c) (p=&xp=&y)}
62
Questions
• When is it correct to use a strong update?A weak update?
• Is this points-to analysis precise?
• What does it mean to say– p must-point-to x at program point u– p may-point-to x at program point u– p must-not-point-to x at program u– p may-not-point-to x at program u
63
Points-to analysis, formally
• We must formally define what we want to compute before we can answer many such questions
64
PWhile syntax
• A primitive statement is of the form• x := null• x := y• x := *y• x := &y;• *x := y• skip(where x and y are variables in Var)
Omitted (for now)• Dynamic memory allocation• Pointer arithmetic• Structures and fields• Procedures
65
PWhile operational semantics
• State : (VarZ) (VarVar{null})• x = y s = • x = *y s = • *x = y s = • x = null s = • x = &y s =
66
PWhile operational semantics
• State : (VarZ) (VarVar{null})• x = y s = s[xs(y)]• x = *y s = s[xs(s(y))]• *x = y s = s[s(x)s(y)]• x = null s = s[xnull]• x = &y s = s[xy]
must say what happens if null is
dereferenced
67
PWhile collecting semantics
• CS[u] = set of concrete states that can reach program point u (CFG node)
68
Ideal PT Analysis: formal definition
• Let u denote a node in the CFG
• Define IdealMustPT(u) to be { (p,x) | forall s in CS[u]. s(p) = x }
• Define IdealMayPT(u) to be{ (p,x) | exists s in CS[u]. s(p) = x }
69
May-point-to analysis:formal Requirement specification
For every vertex u in the CFG,compute a set R(u) such that
R(u) { (p,x) | $sCS[u]. s(p) = x }
Compute R: V -> 2Vars’ such thatR(u) IdealMayPT(u)
(where Var’ = Var U {null})
May Point-To Analysis
70
May-point-to analysis:formal Requirement specification
• An algorithm is said to be correct if the solution R it computes satisfies"uV. R(u) IdealMayPT(u)
• An algorithm is said to be precise if the solution R it computes satisfies"uV. R(u) = IdealMayPT(u)
• An algorithm that computes a solution R1 is said to be more precise than one that computes a solution R2 if
"uV. R1(u) R2(u)
Compute R: V -> 2Vars’ such thatR(u) IdealMayPT(u)
71
(May-point-to analysis)Algorithm A
• Is this algorithm correct?• Is this algorithm precise?
• Let’s first completely and formally define the algorithm
72
Points-to graphsx = &a;y = &b;if (?) { p = &x;} else { p = &y;}
*x = &c;*p = &c;
{x=&a y=&b (p=&xp=&y)}
{x=&a y=&b (p=&xp=&y) a=&c}{(x=&ax=&c) (y=&by=&c) (p=&xp=&y)}
axc
by
pThe points-to set of x
73
Algorithm A: A formal definitionthe “Data Flow Analysis” Recipe
• Define join-semilattice of abstract-values– PTGraph ::= (Var, VarVar’)– g1 g2 = ?– = ?– = ?
• Define transformers for primitive statements– stmt# : PTGraph PTGraph
74
Algorithm A: A formal definitionthe “Data Flow Analysis” Recipe
• Define join-semilattice of abstract-values– PTGraph ::= (Var, VarVar’)– g1 g2 = (Var, E1 E2)– = (Var, {}) – = (Var, VarVar’)
• Define transformers for primitive statements– stmt# : PTGraph PTGraph
75
Algorithm A: transformers
• Abstract transformers for primitive statements– stmt # : PTGraph PTGraph
• x := y # (Var, E) = ?• x := null # (Var, E) = ?• x := &y # (Var, E) = ?• x := *y # (Var, E) = ?• *x := &y # (Var, E) = ?
76
Algorithm A: transformers
• Abstract transformers for primitive statements– stmt # : PTGraph PTGraph
• x := y # (Var, E) = (Var, E[succ(x)=succ(y)]• x := null # (Var, E) = (Var, E[succ(x)={null}]• x := &y # (Var, E) = (Var, E[succ(x)={y}]• x := *y # (Var, E) = (Var,
E[succ(x)=succ(succ(y))] • *x := &y # (Var, E) = ???
77
Correctness & precision
• We have a complete & formal definition of the problem
• We have a complete & formal definition of a proposed solution
• How do we reason about the correctness & precision of the proposed solution?
78
Points-to analysis(abstract interpretation)
(Y) = { (p,x) | exists s in Y. s(p) = x }
CS(u)
2State PTGraph
IdealMayPT(u)
MayPT(u)
IdealMayPT (u) = ( CS(u) )
79
Concrete transformers
• CS[stmt] : State State• x = y s = s[xs(y)]• x = *y s = s[xs(s(y))]• *x = y s = s[s(x)s(y)]• x = null s = s[xnull]• x = &y s = s[xy]
• CS*[stmt] : 2State 2State
• CS*[st] X = { CS[st]s | s X }
80
Shape Analysis
81
Shape Analysis
Automatically verify properties of programs manipulating dynamically allocated storage
Identify all possible shapes (layout) of the heap
82
Sequential Stackvoid push (int v) { Node x = malloc(sizeof(Node)); x->d = v; x->n = Top; Top = x;}
int pop() { if (Top == NULL) return EMPTY;
Node *s = Top->n;
int r = Top->d;
Top = s;
return r;
{
Want to VerifyNo Null DereferenceUnderlying list remains acyclic after each operation
83
Shape Analysis via 3-valued Logic
1) Abstraction– 3-valued logical structure– canonical abstraction
2) Transformers– via logical formulae– soundness by construction • embedding theorem, [SRW02]
84
Concrete State• represent a concrete state as a two-valued logical
structure– Individuals = heap allocated objects– Unary predicates = object properties– Binary predicates = relations
• parametric vocabulary
Topn n
u1 u2 u3
(storeless, no heap addresses)
85
Concrete State• S = <U, > over a vocabulary P• U – universe• - interpretation, mapping each predicate from p to its
truth value in S
Topn n
u1 u2 u3
U = { u1, u2, u3} P = { Top, n }
(n)(u1,u2) = 1, (n)(u1,u3)=0, (n)(u2,u1)=0 …,(Top)(u1)=1, (Top)(u2)=0, (Top)(u3)=0
86
v1,v2: n(v1, v2) n*(v2, v1)
Formulae for Observing Properties
void push (int v) { Node x =
malloc(sizeof(Node));
xd = v;
xn = Top;
Top = x;
}
Topn nu1 u2 u3
1
No Cyclesv1,v2: n(v1, v2) n*(v2, v1) 1
w:Top(w)Top != null
w: x(w)
w: x(w) No node precedes Topv1,v2: n(v1, v2) Top(v2) 1
v1,v2: n(v1, v2) Top(v2)
87
Concrete Interpretation Rules
Statement Update formula
x =NULL x’(v)= 0
x= malloc() x’(v) = IsNew(v)
x=y x’(v)= y(v)
x=y next x’(v)= $w: y(w) n(w, v)
x next=y n’(v, w) = (x(v) n(v, w)) (x(v) y(w))
88
Example: s = Topn
Top
s’(v) = $v1: Top(v1) n(v1,v)
u1 u2 u3 Top
s
u1 u2 u3
Top
u1 1
u2 0
u3 0
n u1 u2 U3
u1 0 1 0
u2 0 0 1
u3 0 0 0
s
u1 0
u2 0
u3 0
Top
u1 1
u2 0
u3 0
n u1 u2 U3
u1 0 1 0
u2 0 0 1
u3 0 0 0
s
u1 0
u2 1
u3 0
89
Collecting Semantics
{ st(w)(S) | S CSS[w] } (w,v ) E(G),
w Assignments(G)
< {,} >
CSS [v] =
if v = entry
{ S | S CSS[w] } (w,v ) E(G),
w Skip(G)
{ S | S CSS[w] and S cond(w)} (w,v ) True-Branches(G)
{ S | S CSS[w] and S cond(w)}(w,v ) False-Branches(G)
othrewise
90
Collecting Semantics
• At every program point – a potentially infinite set of two-valued logical structures
• Representing (at least) all possible heaps that can arise at the program point
• Next step: find a bounded abstract representation
91
3-Valued Logic
• 1 = true• 0 = false• 1/2 = unknown
A join semi-lattice, 0 1 = 1/2
1/2
0 1
info
rmati
on o
rder
logical order
92
3-Valued Logical Structures
• A set of individuals (nodes) U• Relation meaning– Interpretation of relation symbols in P
p0() {0,1, 1/2}p1(v) {0,1, 1/2}p2(u,v) {0,1, 1/2}
• A join semi-lattice: 0 1 = 1/2
93
Boolean Connectives [Kleene] 0 1/2 10 0 0 01/2 0 1/2 1/21 0 1/2 1
0 1/2 10 0 1/2 11/2 1/2 1/2 11 1 1 1
94
Property Space
• 3-struct[P] = the set of 3-valued logical structures over a vocabulary (set of predicates) P
• Abstract domain– (3-Struct[P])– is • We will see alternatives later (maybe)
95
Embedding Order
• Given two structures S = <U, >, S’ = <U’, ’> and an onto function f : U U’ mapping individuals in U to individuals in U’
• We say that f embeds S in S’ (denoted by S S’) if– for every predicate symbol p P of arity k: u1, …, uk U, (p)
(u1, ..., uk) ’(p)(f(u1), ..., f (uk)) – and for all u’ U’
( | { u | f (u) = u’ } | > 1) ’(sm)(u’)
• We say that S can be embedded in S’ (denoted by S S’) if there exists a function f such that S f S’
96
Tight Embedding
• S’ = <U’, ’> is a tight embedding of S=< U, > with respect to a function f if:– S’ does not lose unnecessary information
’(u’1,…, u’k) = {(u1 ..., uk) | f(u1)=u’1,..., f(uk)=u’k}
• One way to get tight embedding is canonical abstraction
97
Canonical Abstraction
Topn nu1 u2 u3u1 u2 u3
[Sagiv, Reps, Wilhelm, TOPLAS02]
98
Canonical Abstraction
Topn nu1 u2 u3u1 u2 u3
Top
[Sagiv, Reps, Wilhelm, TOPLAS02]
99
Canonical Abstraction
Topn nu1 u2 u3u1 u2 u3
Top
100
Canonical Abstraction
Topn nu1 u2 u3u1 u2 u3
Top
101
Canonical Abstraction
Topn nu1 u2 u3u1 u2 u3
Top
102
Canonical Abstraction
Topn nu1 u2 u3u1 u2 u3
Top
103
Canonical Abstraction ()
• Merge all nodes with the same unary predicate values into a single summary node
• Join predicate values
’(u’1 ,..., u’k) = { (u1 ,..., uk) | f(u1)=u’1 ,..., f(uk)=u’k }
• Converts a state of arbitrary size into a 3-valued abstract state of bounded size
• (C) = { (c) | c C }
104
Information Loss
Topn n
Topn n
n
Top
Top
Top
Canonical abstraction
Topn n
105
Instrumentation Predicates• Record additional derived information via
predicatesrx(v) = $v1: x(v1) n*(v1,v)
c(v) = v1: n(v1, v) n*(v, v1)
Topn n
cTop cn n
n
Top
cTop
Canonical abstraction
106
Embedding Theorem: Conservatively Observing Properties
No Cyclesv1,v2: n(v1, v2) n*(v2, v1)1/2
Top rTop rTop
No cycles (derived)v:c(v) 1
107
Operational Semanticsvoid push (int v) { Node x = malloc(sizeof(Node)); x->d = v; x->n = Top; Top = x;}
int pop() { if (Top == NULL) return EMPTY;
Node *s = Top->n;
int r = Top->d;
Top = s;
return r;
{
s’(v) = $v1: Top(v1) n(v1,v)
s = Top->n
Topn nu1 u2 u3
Top n nu1 u2 u3
s
108
Abstract Semantics
Top
s = Topn ?rTop rTop Top rTop rTop
s
s’(v) = $v1: Top(v1) n(v1,v)
s = Top->n
109
Top
Best Transformer (s = Topn)Concrete
Semantics
Canonical
Abstraction
AbstractSemantics
?rTop rTop
...
Top rTop rTop
Top rTop rTop rTop
...
Top
s
rTop rTop
Top
s
rTop rTop rTop
Top
s
rTop rTop
Top
s
rTop rTop rTop
s’(v) = $v1: Top(v1) n(v1,v)
110
Semantic Reduction
• Improve the precision of the analysis by recovering properties of the program semantics
• A Galois connection (C, , , A)• An operation op:AA is a semantic reduction
when– "lL2 op(l)l and – (op(l)) = (l) l
C A
op
111
The Focus Operation
• Focus: Formula((3-Struct) (3-Struct))• Generalizes materialization• For every formula – Focus()(X) yields structure in which evaluates to a
definite values in all assignments– Only maximal in terms of embedding– Focus() is a semantic reduction– But Focus()(X) may be undefined for some X
112
Top rTop rTop
Top rTop rTop rTop
Partial Concretization Based on Transformer (s=Topn)
AbstractSemantics
Partial Concretization
Canonical Abstraction
AbstractSemantics
Top
s
rTop rTop
Top
s
rTop rTop rTop
Top rTop rTop rTop
s
Top
s
rTop rTop
Top rTop rTop
s’(v) = $v1: Top(v1) n(v1,v)
$u: top(u) n(u, v)
Focus (Topn)
113
Partial Concretization
• Locally refine the abstract domain per statement• Soundness is immediate• Employed in other shape analysis algorithms
[Distefano et.al., TACAS’06, Evan et.al., SAS’07, POPL’08]
114
The Coercion Principle
• Another Semantic Reduction• Can be applied after Focus or after Update or
both• Increase precision by exploiting some
structural properties possessed by all stores (Global invariants)
• Structural properties captured by constraints
• Apply a constraint solver
115
Apply Constraint Solver
Top rTop rTop
x
Top rTop rTop
x
x rx rx,ryn
rx,ry
n n
ny
n
Top rTop rTop
x rx rx,ryn
rx,ry
n
y
n
116
Sources of Constraints
• Properties of the operational semantics• Domain specific knowledge– Instrumentation predicates
• User supplied
117
Example Constraints
x(v1) x(v2)eq(v1, v2)
n(v, v1) n(v,v2)eq(v1, v2)
n(v1, v) n(v2,v)eq(v1, v2)is(v)
n*(v3, v4)t[n](v1, v2)
118
Abstract Transformers: Summary
• Kleene evaluation yields sound solution• Focus is a statement-specific partial
concretization• Coerce applies global constraints
119
Abstract Semantics
{ t_embed(coerce(st(w)3(focusF(w)(SS[w] )))) (w,v ) E(G),
w Assignments(G)
< {,} >
SS [v] =
if v = entry
{ S | S SS[w] } (w,v ) E(G),
w Skip(G)
{ t_embed(S) | S coerce(st(w)3(focusF(w)(SS[w] ))) and S 3 cond(w)} (w,v ) True-Branches(G)
{ t_embed(S) | S coerce(st(w)3(focusF(w)(SS[w] ))) and S 3 cond(w)} (w,v ) False-Branches(G)
othrewise
120
Recap
• Abstraction– canonical abstraction – recording derived information
• Transformers–partial concretization (focus)– constraint solver (coerce)– sound information extraction
121
Stack Push
void push (int v) { Node x =
alloc(sizeof(Node));
xd = v;
xn = Top;
Top = x;
}
Topx
Top
Topx
Top
Topx
emp
x
x
xTop
Topv:c(v)
v: x(v)
v: x(v)
v1,v2: n(v1, v2) Top(v2)
…