Upload
kalkin
View
45
Download
0
Embed Size (px)
DESCRIPTION
Constraint-Based Analysis. CS 8803 FPL Oct 24, 2012 (Slides courtesy of Alex Aiken). unlock. lock. unlock. Error. Unlocked. Locked. lock. Code Example. Flow Sensitivity. void f(state *x, state *y) { result = spin_trylock( & x->lock); spin_lock( & y->lock); … - PowerPoint PPT Presentation
Citation preview
Constraint-Based Analysis
CS 8803 FPLOct 24, 2012
(Slides courtesy of Alex Aiken)
1
2
void f(state *x, state *y) {result = spin_trylock(&x->lock); spin_lock(&y->lock);…if (!result) spin_unlock(&x->lock);spin_unlock(&y->lock);
}
Code Example
Path Sensitivity
result
(!result)Pointers &
Heap
(&x->lock);
(&x->lock);
(&y->lock);
(&y->lock); Inter-
procedural
Flow Sensitivityspin_tryloc
kspin_lock
spin_unlock
Locked
Unlocked
Error
unlock
lock unlock
lock
3
Saturn• What?
– SAT-based approach to static bug detection
• How? – SAT-based approach– Program constructs Boolean constraints– Inference SAT solving
• Why SAT?– Lots of reasons, but for now:– Program states naturally expressed as bits– The theory for bits is SAT– Efficient solvers widely available
4
Intuition• Analyzing in one direction is
problematic– Forwards or backwards– Consider null dereference analysis
• No null ptr assignments: forwards is best• No dereferences: backwards is best
• Constraints– Give a global picture of the program– Allow more efficient order of solution
5
Straight-line Code
void f(int x, int y) {
int z = x & y ; assert(z == x);
}
x31 … x0 y31 … y0
==
x31y31 … x0y0
Bitwise-AND
R
y&xz==
;
6
Straight-line Code
void f(int x, int y) {
int z = x & y; assert(z == x);
}
R
Query: Is-Satisfiable( )Answer: Yes x = [00…1] y = [00…0]Negated assertion is satisfiable.Therefore, the assertion may fail.
7
Control Flow – Preparation• Approach
– Assumes loop free program– Unroll loops, drop backedges
• May miss errors that are deeply buried– Bug finding, not verification– Many errors surface in a few iterations
• Advantages– Simplicity, reduces false positives
8
if (c)
Control Flow – Example
• Merges– preserve path sensitivity– select bits based on the values of incoming
guards
G = c, x: [a31…a0]G = c, x: [b31…b0]G = c c, x: [v31…v0] where vi = (cai)(cbi)
c x = a;
x = b;res =
x;
c if (c) x = a; else x = b; res = x;
true
9
Pointers – Overview• May point to different locations…
– Thus, use points-to sets p: { l1,…,ln }
• … but path sensitive – Use guards on points-to relationships
p: { (g1, l1), …, (gn, ln) }
10
G = c, p: { (true, y) }
Pointers – Example
G = true, p: { (true, x) }p = &x;if (c) p = &y;res = *p; G = true, p: { (c, y); (c, x)}
if (c) res = y;else if (c) res = x;
11
Pointers – Recap• Guarded Location Sets
{ (g1, l1), …, (gn, ln) }
• Guards– Condition under which points-to relationship
holds– Collected from statement guards
• Pointer Dereference– Conditional Assignments
12
Not Covered• Other Constructs
– Structs, …
• Modeling of the environment
• Optimizations– several to reduce size of formulas– some form of program slicing important
13
What can we do with Saturn?int f(lock_t *l) {
lock(l);…unlock(l);
}
if (l->state == Unlocked) l->state = Locked;else l->state = Error;
if (l->state == Locked) l->state = Unlocked;else l->state = Error;
Locked
Unlocked
Error
unlock
lock unlock
lock
14
General FSM Checking• Encode FSM in the program
– State Integer– Transition Conditional Assignments
• Check code behavior– SAT queries
15
How are we doing so far?• Precision:
• Scalability: – SAT limit is 1M clauses– About 10 functions
• Solution:– Divide and conquer– Function summaries
16
Function Summaries (1st try)• Function behavior
can be summarized with a set of state transitions
• Summary:*l: Unlocked Unlocked
Locked Error
int f(lock_t *l){
lock(l);…
…unlock(l);return 0;
}
17
int f(lock_t *l){
lock(l);…if (err) return -1;…unlock(l);return 0;
}
A Difficulty• Problem
– two possible output states
– distinguished by return value(retval == 0)…
• Summary1. (retval == 0) *l: Unlocked Unlocked
Locked Error2. (retval == 0) *l: Unlocked Locked
Locked Error
18
FSM Function Summaries• Summary representation (simplified):
{ Pin, Pout, R }
• User gives:– Pin: predicates on initial state– Pout: predicates on final state– Express interprocedural path sensitivity
• Saturn computes:– R: guarded state transitions– Used to simulate function behavior at call site
19
int f(lock_t *l){
lock(l);…if (err) return -1;…unlock(l);return 0;
}
Lock Summary (2nd try)• Output predicate:
– Pout = { (retval == 0) }
• Summary (R):1. (retval == 0) *l: Unlocked Unlocked
Locked Error2. (retval == 0) *l: Unlocked Locked
Locked Error
20
Lock checker for Linux• Parameters:
– States: { Locked, Unlocked, Error }– Pin = {}– Pout = { (retval == 0) }
• Experiment:– Linux Kernel 2.6.5: 4.8MLOC– ~40 lock/unlock/trylock primitives– 20 hours to analyze
• 3.0GHz Pentium IV, 1GB memory
21
Double Locking/Unlockingstatic void sscape_coproc_close(…) {
spin_lock_irqsave(&devc->lock, flags);if (…)
sscape_write(devc, DMAA_REG, 0x20);…
}
static void sscape_write(struct … *devc, …) {spin_lock_irqsave(&devc->lock, flags);…
}
22
Ambiguous Return Stateint i2o_claim_device(…) {down(&i2o_configuration_lock);if (d->owner) {
up(&i2o_configuration_lock);return –EBUSY;
}if (…) {
return –EBUSY;}…
}
23
BugsType Bugs False Pos. % Bugs
Double Locking 134 99 57%
Ambiguous State 45 22 67%
Total 179 121 60%
Previous Work: MC (31), CQual (18), <20% Bugs
24
Function Summary Database• 63,000 functions in Linux
– More than 23,000 are lock related– 17,000 with locking constraints on entry– Around 9,000 affects more than one
lock– 193 lock wrappers– 375 unlock wrappers– 36 with return value/lock state
correlation
• Available on the web . . .
25
Another Checker• Memory leaks
– Common, esp. in error handling code– Hard to find– Problematic in long running applications
• Current techniques– Escape analysis– Ownership types– Region based analysis…
26
Simple Leakchar *f() {char *p;p = (char*)malloc(…);…if (err) return NULL;…return p;
}
27
Scenario 1 – Malloc Wrapperschar *f() {char *p;p = (char*)strdup(…);…if (err) return NULL;…return p;
}
28
Scenario 2 – External Referenceschar *f(struct *s) {char *p;p = (char*)malloc(…);s->name = p;if (err) return NULL;…return p;
}
29
Scenario 3 – Function Callschar *f(struct state *s) {char *p;p = (char*)malloc(…);g(s, p);if (err) return NULL;…return p;
}void g(s, p) { s->name = p;}
30
Scenario 4 – Data dependencyvoid f(int len) {char fastbuf[10], *p;if (len < 10) p = fastbuf;else p = (char *)malloc(len);…if (p != fastbuf) free(p);
}
31
Requirements• Track points-to relationships precisely
• Infer escaping functions– ones that create external references to
objects passed in via parameters
• Infer allocation functions
32
Analysis Part I – Points-to Rule• PointsTo(p, l)
– condition under which p points to l
(p) = { (g0, l0), …, (gn-1, ln-1) }
PointsTo(p, l) = gi (if li = l) false (otherwise)
33
Analysis Part II – EscapeVia• EscapeVia(l, p, X)
– the condition under which location l escapes via pointer p, excluding references in set X
• Access Roots– Every object in the function body is accessed
through one of the following “roots”• Parameters (p1…pn)• The Return Value (ret_val)• Global Variables• Local Variables• Heap Allocated Objects
34
Analysis Part II – EscapeVia• Never escape through local variables
Root(p) Locals X EscapeVia(l, p, X) = false
• Always escape through global variables
RootOf(p) GlobalsEscapeVia(l, p, X) = PointsTo(p, l)
35
• Escaping through parameters/return RootOf(p) (Params { ret_val }) – X EscapeVia(l, p, X) = PointsTo(p, l)
• Escaping via another allocated location
RootOf(p) NewLocs – XEscapeVia(l, p, X) = PointsTo(p, l)
Escaped(p,X {RootOf(l)})
Analysis Part II – EscapeVia
36
Analysis Part III – Escape/Leak• Escape Condition
Escaped(l, X) = p EscapedVia(l, p, X)
• Leak ConditionLeaked(l, X) = Escaped(l, X)
• Leak CheckerFor all new locations l, there is a leak if
Satisfiable(Leaked(l, {}))
37
ResultsLOC (K)
# Alloc Func.
# Bugs FP (%)
Samba 404 80 83 8.79%OpenSSL 296 101 117 0.85%
BinUtils 909 91 136(66)
3.55%
OpenSSH
36 19 29(10) 0%
Total 1,646 291 365 3.69%
38
Why SAT? (Revisited …)• Moore’s Law
• Uniform modeling of constructs as bits
• Constraints– Local specification– Global solution
• Incremental SAT solving– makes multiple queries efficient
39
Why SAT? (Cont.)• Path sensitivity is important
– To find bugs– To reduce false positives– Much easier to model precisely with SAT
• Compositionality is important– Function summaries critical for
scalability– Easy to construct with SAT queries