Constraint-Based Analysis

Constraint-Based Analysis

CS 8803 FPLOct 24, 2012

(Slides courtesy of Alex Aiken)

1

2

void f(state *x, state *y) {result = spin_trylock(&x->lock); spin_lock(&y->lock);…if (!result) spin_unlock(&x->lock);spin_unlock(&y->lock);

}

Code Example

Path Sensitivity

result

(!result)Pointers &

Heap

(&x->lock);

(&x->lock);

(&y->lock);

(&y->lock); Inter-

procedural

Flow Sensitivityspin_tryloc

kspin_lock

spin_unlock

Locked

Unlocked

Error

unlock

lock unlock

lock

3

Saturn• What?

– SAT-based approach to static bug detection

• How? – SAT-based approach– Program constructs Boolean constraints– Inference SAT solving

• Why SAT?– Lots of reasons, but for now:– Program states naturally expressed as bits– The theory for bits is SAT– Efficient solvers widely available

4

Intuition• Analyzing in one direction is

problematic– Forwards or backwards– Consider null dereference analysis

• No null ptr assignments: forwards is best• No dereferences: backwards is best

• Constraints– Give a global picture of the program– Allow more efficient order of solution

5

Straight-line Code

void f(int x, int y) {

int z = x & y ; assert(z == x);

}

x31 … x0 y31 … y0

==

x31y31 … x0y0

Bitwise-AND

R

y&xz==

;

6

Straight-line Code

void f(int x, int y) {

int z = x & y; assert(z == x);

}

R

Query: Is-Satisfiable( )Answer: Yes x = [00…1] y = [00…0]Negated assertion is satisfiable.Therefore, the assertion may fail.

7

Control Flow – Preparation• Approach

– Assumes loop free program– Unroll loops, drop backedges

• May miss errors that are deeply buried– Bug finding, not verification– Many errors surface in a few iterations

• Advantages– Simplicity, reduces false positives

8

if (c)

Control Flow – Example

• Merges– preserve path sensitivity– select bits based on the values of incoming

guards

G = c, x: [a31…a0]G = c, x: [b31…b0]G = c c, x: [v31…v0] where vi = (cai)(cbi)

c x = a;

x = b;res =

x;

c if (c) x = a; else x = b; res = x;

true

9

Pointers – Overview• May point to different locations…

– Thus, use points-to sets p: { l1,…,ln }

• … but path sensitive – Use guards on points-to relationships

p: { (g1, l1), …, (gn, ln) }

10

G = c, p: { (true, y) }

Pointers – Example

G = true, p: { (true, x) }p = &x;if (c) p = &y;res = *p; G = true, p: { (c, y); (c, x)}

if (c) res = y;else if (c) res = x;

11

Pointers – Recap• Guarded Location Sets

{ (g1, l1), …, (gn, ln) }

• Guards– Condition under which points-to relationship

holds– Collected from statement guards

• Pointer Dereference– Conditional Assignments

12

Not Covered• Other Constructs

– Structs, …

• Modeling of the environment

• Optimizations– several to reduce size of formulas– some form of program slicing important

13

What can we do with Saturn?int f(lock_t *l) {

lock(l);…unlock(l);

}

if (l->state == Unlocked) l->state = Locked;else l->state = Error;

if (l->state == Locked) l->state = Unlocked;else l->state = Error;

Locked

Unlocked

Error

unlock

lock unlock

lock

14

General FSM Checking• Encode FSM in the program

– State Integer– Transition Conditional Assignments

• Check code behavior– SAT queries

15

How are we doing so far?• Precision:

• Scalability: – SAT limit is 1M clauses– About 10 functions

• Solution:– Divide and conquer– Function summaries

16

Function Summaries (1st try)• Function behavior

can be summarized with a set of state transitions

• Summary:*l: Unlocked Unlocked

Locked Error

int f(lock_t *l){

lock(l);…

…unlock(l);return 0;

}

17

int f(lock_t *l){

lock(l);…if (err) return -1;…unlock(l);return 0;

}

A Difficulty• Problem

– two possible output states

– distinguished by return value(retval == 0)…

• Summary1. (retval == 0) *l: Unlocked Unlocked

Locked Error2. (retval == 0) *l: Unlocked Locked

Locked Error

18

FSM Function Summaries• Summary representation (simplified):

{ Pin, Pout, R }

• User gives:– Pin: predicates on initial state– Pout: predicates on final state– Express interprocedural path sensitivity

• Saturn computes:– R: guarded state transitions– Used to simulate function behavior at call site

19

int f(lock_t *l){

lock(l);…if (err) return -1;…unlock(l);return 0;

}

Lock Summary (2nd try)• Output predicate:

– Pout = { (retval == 0) }

• Summary (R):1. (retval == 0) *l: Unlocked Unlocked

Locked Error2. (retval == 0) *l: Unlocked Locked

Locked Error

20

Lock checker for Linux• Parameters:

– States: { Locked, Unlocked, Error }– Pin = {}– Pout = { (retval == 0) }

• Experiment:– Linux Kernel 2.6.5: 4.8MLOC– ~40 lock/unlock/trylock primitives– 20 hours to analyze

• 3.0GHz Pentium IV, 1GB memory

21

Double Locking/Unlockingstatic void sscape_coproc_close(…) {

spin_lock_irqsave(&devc->lock, flags);if (…)

sscape_write(devc, DMAA_REG, 0x20);…

}

static void sscape_write(struct … *devc, …) {spin_lock_irqsave(&devc->lock, flags);…

}

22

Ambiguous Return Stateint i2o_claim_device(…) {down(&i2o_configuration_lock);if (d->owner) {

up(&i2o_configuration_lock);return –EBUSY;

}if (…) {

return –EBUSY;}…

}

23

BugsType Bugs False Pos. % Bugs

Double Locking 134 99 57%

Ambiguous State 45 22 67%

Total 179 121 60%

Previous Work: MC (31), CQual (18), <20% Bugs

24

Function Summary Database• 63,000 functions in Linux

– More than 23,000 are lock related– 17,000 with locking constraints on entry– Around 9,000 affects more than one

lock– 193 lock wrappers– 375 unlock wrappers– 36 with return value/lock state

correlation

• Available on the web . . .

25

Another Checker• Memory leaks

– Common, esp. in error handling code– Hard to find– Problematic in long running applications

• Current techniques– Escape analysis– Ownership types– Region based analysis…

26

Simple Leakchar *f() {char *p;p = (char*)malloc(…);…if (err) return NULL;…return p;

}

27

Scenario 1 – Malloc Wrapperschar *f() {char *p;p = (char*)strdup(…);…if (err) return NULL;…return p;

}

28

Scenario 2 – External Referenceschar *f(struct *s) {char *p;p = (char*)malloc(…);s->name = p;if (err) return NULL;…return p;

}

29

Scenario 3 – Function Callschar *f(struct state *s) {char *p;p = (char*)malloc(…);g(s, p);if (err) return NULL;…return p;

}void g(s, p) { s->name = p;}

30

Scenario 4 – Data dependencyvoid f(int len) {char fastbuf[10], *p;if (len < 10) p = fastbuf;else p = (char *)malloc(len);…if (p != fastbuf) free(p);

}

31

Requirements• Track points-to relationships precisely

• Infer escaping functions– ones that create external references to

objects passed in via parameters

• Infer allocation functions

32

Analysis Part I – Points-to Rule• PointsTo(p, l)

– condition under which p points to l

(p) = { (g0, l0), …, (gn-1, ln-1) }

PointsTo(p, l) = gi (if li = l) false (otherwise)

33

Analysis Part II – EscapeVia• EscapeVia(l, p, X)

– the condition under which location l escapes via pointer p, excluding references in set X

• Access Roots– Every object in the function body is accessed

through one of the following “roots”• Parameters (p1…pn)• The Return Value (ret_val)• Global Variables• Local Variables• Heap Allocated Objects

34

Analysis Part II – EscapeVia• Never escape through local variables

Root(p) Locals X EscapeVia(l, p, X) = false

• Always escape through global variables

RootOf(p) GlobalsEscapeVia(l, p, X) = PointsTo(p, l)

35

• Escaping through parameters/return RootOf(p) (Params { ret_val }) – X EscapeVia(l, p, X) = PointsTo(p, l)

• Escaping via another allocated location

RootOf(p) NewLocs – XEscapeVia(l, p, X) = PointsTo(p, l)

Escaped(p,X {RootOf(l)})

Analysis Part II – EscapeVia

36

Analysis Part III – Escape/Leak• Escape Condition

Escaped(l, X) = p EscapedVia(l, p, X)

• Leak ConditionLeaked(l, X) = Escaped(l, X)

• Leak CheckerFor all new locations l, there is a leak if

Satisfiable(Leaked(l, {}))

37

ResultsLOC (K)

# Alloc Func.

# Bugs FP (%)

Samba 404 80 83 8.79%OpenSSL 296 101 117 0.85%

BinUtils 909 91 136(66)

3.55%

OpenSSH

36 19 29(10) 0%

Total 1,646 291 365 3.69%

38

Why SAT? (Revisited …)• Moore’s Law

• Uniform modeling of constructs as bits

• Constraints– Local specification– Global solution

• Incremental SAT solving– makes multiple queries efficient

39

Why SAT? (Cont.)• Path sensitivity is important

– To find bugs– To reduce false positives– Much easier to model precisely with SAT

• Compositionality is important– Function summaries critical for

scalability– Easy to construct with SAT queries

Documents

Constraint-Based Analysis