Program Analysis for Security John Mitchell CS 155 Spring 2012

Program Analysis for Security

John Mitchell

CS 155 Spring 2012

Software bugs are serious problems

Thanks: Isil and Thomas Dillig

3

Several different approaches

• Instrument code for testing– Heap memory: Purify– Perl tainting (information flow)– Java race condition checking

• Black-box testing– Fuzzing and penetration testing– Black-box web application security analysis

• Static code analysis– FindBugs, Fortify, Coverity, MS tools, …

Entry

1

2 3

4

Software

Exit

Behaviors

Entry

1

2

4

Exit

1 2 41 2 4

1 3 4

1 2 4 1 3 4

1 2 3 1 2 4 1 3 4

1 2 4 1 2 3 1 3 4

1 2 3 1 2 3 1 3 4

1 2 4 1 2 4 1 3 4

. . .

1 2 4 1 3 4

Manual testingonly examines small subset of behaviors

4

Static analysis goals

• Bug finding– Identify code that the programmer wishes to

modify or improve• Correctness

– Verify the absence of certain classes of errors

Note: some fundamental limitations…

Outline

• General discussion of tools– Fundamental limitations– Basic method based on abstract states

• More details on one specific method– Property checkers from Engler et al., Coverity– Sample security-related results

• Quick tour of another tool– JavaScript confinement verifier– Proves isolation; not “just” bug finding

Slides from: S. Bugrahe, A. Chou, I&T Dillig, D. Engler, …

Program Analyzers

Code Report Type Line

1 mem leak 324

2 buffer oflow 4,353,245

3 sql injection 23,212

4 stack oflow 86,923

5 dang ptr 8,491

… … …

10,502 info leak 10,921

Program Analyzer

Spec

potentially reports manywarnings

may emit false alarms

analyze large code bases

false alarm

false alarm

Soundness, CompletenessProperty Definition

Soundness If the program contains an error, the analysis will report a warning.“Sound for reporting correctness”

Completeness If the analysis reports an error, the program will contain an error.“Complete for reporting correctness”

Complete IncompleteSo

und

Uns

ound

Reports all errorsReports no false alarms

Reports all errorsMay report false alarms

Undecidable Decidable

Decidable

May not report all errorsMay report false alarms

Decidable

May not report all errorsReports no false alarms

Software

. . .

Behaviors

SoundOver-approximation of

Behaviors

False Alarm

ReportedError

approximation is too coarse……yields too many false alarms

Modules

entry

X 0

Is Y = 0 ?

X X + 1 X X - 1

Is Y = 0 ?

Is X < 0 ? exit

crash

yes

noyes

no

yes no

Does this program ever crash?

entry

X 0

Is Y = 0 ?

X X + 1 X X - 1

Is Y = 0 ?

Is X < 0 ? exit

crash

yes

noyes

no

yes no

infeasible path!… program will never crash

Does this program ever crash?

entry

X 0

Is Y = 0 ?

X X + 1 X X - 1

Is Y = 0 ?

Is X < 0 ? exit

crash

yes

noyes

no

yes no

X = 0

X = 0

X = 1

X = 1

X = 1

X = 1

X = 1

X = 2

X = 2

X = 2

X = 2

X = 2

X = 3

X = 3

X = 3

X = 3

non-termination!… therefore, need to approximate

Try analyzing without approximating…

X X + 1 f

din

dout

dout = f(din)

X = 0

X = 1

dataflow elements

transfer functiondataflow equation

X X + 1 f1

din1

dout1 = f1(din1)

Is Y = 0 ? f2

dout2

dout1

din2 dout1 = din2

dout2 = f2(din2)

X = 0

X = 1

X = 1

X = 1

dout1 = f1(din1)

djoin = dout1 ⊔ dout2

dout2 = f2(din2)f1 f2

f3

dout1

din1 din2

dout2

djoin

din3

dout3

djoin = din3

dout3 = f3(din3)

least upper bound operatorExample: union of possible values

What is the space of dataflow elements, ?What is the least upper bound operator, ?⊔

entry

X 0

Is Y = 0 ?

X X + 1 X X - 1

Is Y = 0 ?

Is X < 0 ? exit

crash

yes

noyes

no

yes no

X = 0

X = 0

X = posX = T

X = neg

X = 0

X = T X = T

X = T

Try analyzing with “signs” approximation…

terminates...… but reports false alarm… therefore, need more precision

lost precision

X = T

X = T

X = pos X = 0 X = neg

X =

X neg X postrue

Y = 0 Y 0

false

X = T

X = pos X = 0 X = neg

X =

signs lattice Boolean formula latticerefined signs lattice

entry

X 0

Is Y = 0 ?

X X + 1 X X - 1

Is Y = 0 ?

Is X < 0 ? exit

crash

yes

noyes

no

yes no

X = 0true

X = 0Y=0

X = posY=0 X = neg Y0

X = posY=0X = negY0

X = posY=0

X = pos Y=0

X = neg Y0

X = 0 Y0

Try analyzing with “path-sensitive signs” approximation…

terminates...… no false alarm… soundly proved never crashes

no precision loss

refinement

Outline





21

Bugs to Detect

Some examples• Crash Causing Defects• Null pointer dereference• Use after free• Double free • Array indexing errors• Mismatched array

new/delete• Potential stack overrun• Potential heap overrun• Return pointers to local

variables• Logically inconsistent

code

• Uninitialized variables• Invalid use of negative values• Passing large parameters by

value• Underallocations of dynamic

data• Memory leaks• File handle leaks• Network resource leaks• Unused values• Unhandled return codes• Use of invalid iterators

Slide credit: Andy Chou

22

Example: Check for missing optional args

•Prototype for open() syscall:

•Typical mistake:

•Result: file has random permissions

•Check: Look for oflags == O_CREAT without mode argument

int open(const char *path, int oflag, /* mode_t mode */...);

fd = open(“file”, O_CREAT);

23

Example: Chroot protocol checker

•Goal: confine process to a “jail” on the filesystem• chroot() changes filesystem root for a process

•Problem• chroot() itself does not change current working directory

chroot() chdir(“/”)

open(“../file”,…) Error if open before chdir

25

Tainting checkers

26

Example Code

#include <stdlib.h>#include <stdio.h>

void say_hello(char * name, int size) { printf("Enter your name: "); fgets(name, size, stdin); printf("Hello %s.\n", name);}

int main(int argc, char *argv[]) { if (argc != 2) { printf("Error, must provide an input buffer size.\n"); exit(-1); } int size = atoi(argv[1]); char * name = (char*)malloc(size); if (name) { say_hello(name, size); free(name); } else { printf("Failed to allocate %d bytes.\n", size); }}

27

atoi

main

exit free malloc

printffgets

say_hello

Callgraph

28

atoi

main

exit free malloc

printffgets

say_hello

Reverse Topological Sort

12

3 4 5 6 7

8

Idea: analyze function before you analyze caller

29

atoi

main

exit free malloc

printffgets

say_hello

Apply Library Models

12

3 4 5 6 7

8

Tool has built-in summaries of library function behavior

30

atoi

main

exit free malloc

printffgets

say_hello

Bottom Up Analysis

12

3 4 5 6 7

8

Analyze function using known properties of functions it calls

31

atoi

main

exit free malloc

printffgets

say_hello

Bottom Up Analysis

12

3 4 5 6 7

8

Analyze function using known properties of functions it calls

32

atoi

main

exit free malloc

printffgets

say_hello

Bottom Up Analysis

12

3 4 5 6 7

8

Finish analysis by analyzing all functions in the program

33

Finding Local Bugs

#define SIZE 8void set_a_b(char * a, char * b) {char * buf[SIZE];if (a) {b = new char[5];} else {if (a && b) {

buf[SIZE] = a;return;

} else {delete [] b;

}*b = ‘x’;}*a = *b;}

34

char * buf[8];

if (a)

b = new char [5]; if (a && b)

buf[8] = a; delete [] b;

*b = ‘x’;

END

*a = *b;

a !a

a && b !(a && b)

Control Flow Graph

Represent logical structure of code in graph form

35

char * buf[8];

if (a)



*b = ‘x’;

END

*a = *b;

a !a

a && b !(a && b)

Path Traversal

Conceptually: Analyze each path through control graph separately

Actually Perform some checking computation once per node; combine paths at merge nodes

Conceptually

Actually

36

char * buf[8];

if (a)

if (a && b)

delete [] b;

*b = ‘x’;

END

*a = *b;

!a

!(a && b)

Apply Checking

Null pointersUse after freeArray overrun

See how three checkers are run for this path

• • Defined by a state diagram, with state

transitions and error states

Checker

• • Assign initial state to each program var• State at program point depends on

state at previous point, program actions• Emit error if error state reached

Run Checker

37

char * buf[8];

if (a)

if (a && b)

delete [] b;

*b = ‘x’;

END

*a = *b;

!a

!(a && b)

Apply Checking


“buf is 8 bytes”

38

char * buf[8];

if (a)

if (a && b)

delete [] b;

*b = ‘x’;

END

*a = *b;

!a

!(a && b)

Apply Checking



“a is null”

39

char * buf[8];

if (a)

if (a && b)

delete [] b;

*b = ‘x’;

END

*a = *b;

!a

!(a && b)

Apply Checking



“a is null”

Already knew a was null

40

char * buf[8];

if (a)

if (a && b)

delete [] b;

*b = ‘x’;

END

*a = *b;

!a

!(a && b)

Apply Checking



“a is null”

“b is deleted”

41

char * buf[8];

if (a)

if (a && b)

delete [] b;

*b = ‘x’;

END

*a = *b;

!a

!(a && b)

Apply Checking



“a is null”

“b is deleted”

“b dereferenced!”

42

char * buf[8];

if (a)

if (a && b)

delete [] b;

*b = ‘x’;

END

*a = *b;

!a

!(a && b)

Apply Checking



“a is null”

“b is deleted”

“b dereferenced!”

No more errorsreported for b

43

False Positives

•What is a bug? Something the user will fix.

•Many sources of false positives• False paths• Idioms• Execution environment assumptions• Killpaths• Conditional compilation• “third party code”• Analysis imprecision• …

44

char * buf[8];

if (a)



*b = ‘x’;

END

*a = *b;

a !a

a && b !(a && b)

A False Path

45

char * buf[8];

if (a)

if (a && b)

buf[8] = a;

END

!a

a && b

False Path Pruning

Integer Range Disequality Branch

46

char * buf[8];

if (a)

if (a && b)

buf[8] = a;

END

!a

a && b

False Path Pruning

“a in [0,0]” “a == 0 is true”


47

char * buf[8];

if (a)

if (a && b)

buf[8] = a;

END

!a

a && b

False Path Pruning

“a in [0,0]” “a == 0 is true”

“a != 0”


48

char * buf[8];

if (a)

if (a && b)

buf[8] = a;

END

!a

a && b

False Path Pruning

“a in [0,0]” “a == 0 is true”

“a != 0”

Impossible


49

Environment Assumptions

•Should the return value of malloc() be checked?

int *p = malloc(sizeof(int));*p = 42;

OS Kernel:Crash machine.

File server:Pause filesystem.

Spreadsheet:Lose unsaved changes.

Game:Annoy user.

Library:?

Medical device:malloc?!

Web application:200ms downtime

IP Phone:Annoy user.

50

Statistical Analysis

•Assume the code is usually right


int *p = malloc(sizeof(int));if(p) *p = 42;







3/4deref

1/4deref

51

Application to Security Bugs

•Stanford research project• Ken Ashcraft and Dawson Engler, Using Programmer-Written Compiler

Extensions to Catch Security Holes, IEEE Security and Privacy 2002• Used modified compiler to find over 100 security holes in Linux and BSD• http://www.stanford.edu/~engler/

•Benefit• Capture recommended practices, known to experts, in tool available to all

Sanitize integers before use

Linux: 125 errors, 24 false; BSD: 12 errors, 4 false

array[v]while(i < v) …

v.clean Use(v)v.tainted

Syscall param

Network packet

copyin(&v, p, len)

any<= v <= any

memcpy(p, q, v)copyin(p,q,v)copyout(p,q,v)

ERROR

Warn when unchecked integers from untrusted sources reach trusting sinks

53

Example security holes

/* 2.4.9/drivers/isdn/act2000/capi.c:actcapi_dispatch */

isdn_ctrl cmd;

...

while ((skb = skb_dequeue(&card->rcvq))) {

msg = skb->data;

...

memcpy(cmd.parm.setup.phone,

msg->msg.connect_ind.addr.num,

msg->msg.connect_ind.addr.len - 1);

•Remote exploit, no checks

54

Example security holes

/* 2.4.5/drivers/char/drm/i810_dma.c */

if(copy_from_user(&d, arg, sizeof(arg)))

return –EFAULT;

if(d.idx > dma->buf_count)

return –EINVAL;

buf = dma->buflist[d.idx];

Copy_from_user(buf_priv->virtual, d.address, d.used);

•Missed lower-bound check:

55

User-pointer inference

•Problem: which are the user pointers?• Hard to determine by dataflow analysis• Easy to tell if kernel believes pointer is from user!

•Belief inference• “*p” implies safe kernel pointer• “copyin(p)/copyout(p)” implies dangerous user ptr• Error: pointer p has both beliefs.

•Implementation: 2 pass checkerinter-procedural: compute all tainted pointers

local pass to check that they are not dereferenced

56

Results for BSD and Linux

•All bugs released to implementers; most serious fixed

Gain control of system 18 15 3 3Corrupt memory 43 17 2 2Read arbitrary memory 19 14 7 7Denial of service 17 5 0 0Minor 28 1 0 0Total 125 52 12 12

Linux BSDViolation Bug Fixed Bug Fixed

Outline





Sandboxing Untrusted JavaScript 58

Two ProblemsSandbox Design: ensure that sandboxed code obtains access to ANY protected resource ONLY via the API

API Confinement: verify that sandboxed code cannot use the API to obtain direct access to a security-critical resource

Evolution of Standardized JavaScript• ECMA-262 3rd edition (ES3) - was around when I started this work in 2008• ECMA-262 5th edition (ES5) - released in Dec 2009

– has a strict mode (ES5-strict)


Solution to the Sandboxing and API Confinement problems for ES5-strict

Plan:1. Subset SES of ES5-strict2. Sandboxing technique for untrusted SES code3. Confinement analysis technique for SES APIs4. Application: Yahoo! ADSafe


The language ES5-strictES5-strict = ES3 with the following restrictions

Restriction (relative to ES3) Rationale

No delete on variable names

No prototypes for scope objects

No with

No this coercion

Safe built-in functions

No .caller, .callee on arguments object

No .caller, .arguments on function objects

No arguments and formal parameters aliasing

Lexical scoping

No ambient access to Global object

Closure-based encapsulation


Scope-bounded eval

Example: eval(“function(){return x}”, x)

Explicitly list free variables of s

• Run-time restriction: Free(Parse(s)) {⊆ x1,…, xn}

• Allows an upper bound on side-effects of executing s

eval(s, x1,…, xn)

Implemented by de-sugaring to(evalnoFree(“function(x1,…, xn){“+ s +“}”))(x1,…, xn)


Next few slides

API Confinement: Verify that sandboxed code cannot use the API to obtain direct access to a security-critical resource

1. Subset SES of ES5-strict2. Sandboxing technique for untrusted SES code3. Confinement analysis technique for SES APIs4. Application: Yahoo! ADSafe

JavaScript API Confinement 63

API Confinement is Complex

Resources,

DOM

f1r1

r4r3

r2

Untrusted JS

Invoke

r2

Return r2

Access r2

r3 r4

Side-effect r4

u1

Repeat

Precision-Efficiency tradeoff

APICritical, e.g.,

window.location


Key Properties of API Implementations•Code is part of the trusted computing base•Small in size, relative to the application•Written in a disciplined manner•Developers have an incentive in keeping the code simple

Insights: •Conservative and scalable static analysis techniques can do well•Can soundly establish API Confinement•Can warn developers away from using complex coding patterns


Challenges & Techniques

Hurdles:• Forall quantification on untrusted code • Analysis of eval(s, x1,…, xn)in general

Techniques:• Flow-Insensitive and Context-Insensitive Points-to analysis• Abstract eval(s, x1,…, xn) by the set of all statements that can be written using free variables {x1,…, xn}

Confine(t, critical): For all untrusted terms s in SESlight,


Verifying Confine(t, critical)

Trusted code t

eval with free vars ”test”,“api”

Environment(Built-ins)

+

+Datalog Solver(least fixed point)

Inference Rules (SES semantics)

Stack(“test”, l) ∧Critical(l) ?

NOT CONFINED

CONFINED

true

false

Abstraction

Our decision procedure and implementation


Express Analysis in Datalog (Whaley et. al.)

Program t

l1:var y = {};

l2:var x = y;

l3:x.f = y;

Facts(t)

Stack(y, l1)

Assign(x, y)

Store(x, “f”, y)

abstract

• Abstract programs as Datalog facts

• Abstract the semantics of SES as Datalog inference rules

Stack(x, l) :- Assign(x, y), Stack(y, l)Heap(l, f, m) :- Store(x, f, y), Stack(x, l), Stack(y, m)

• Execution of program t is abstracted by the least-fixed-point of Facts(t) under the inference rules


Complete set of PredicatesAbstracting terms Abstracting Heaps & Stacks

Assign(x, y) Throw(l, x) Heap(l, x, m) Stack(x, l)

Load(x, y, f) Catch(l, x) Prototype(l, m) FuncType(l)

Store(x, f, y) TP(l, x) ObjType(l) ArrayType(l)

Formal(l, i, x) FormalRet(l, x) NotBuiltin(l) Critical(l)

Actual(x, i, z, y, l) Instance(l, x)

Global(x) Annotation(x, y)

Sufficient to model implicit type conversions, reflection, exceptions


Next few slides1. Subset SES of ES5-strict2. Sandboxing technique for untrusted SES code3. Confinement analysis technique for SES APIs4. Application: Yahoo! ADSafe

Implementation: We implemented the decision procedure in the form of an automated tool ENCAP• built on top of Datalog engine: bddbddb• available online at: http://code.google.com/p/es-lab/Encap

JavaScript API Confinement 70

Application: Yahoo! ADSafe

Analysis (1st attempt)• annotated the API implementation (2000

LOC)• desugared it to SES and ran ENCAP• obtained NOT CONFINED

- culprits: ADSAFE.go and ADSAFE.lib

Sandbox: JSLint Filter

API: ADSAFE object

ADSafe: Mechanism for safely embedding ads This was an actual exploit

Analysis (2nd attempt)• fixed this bug• ran ENCAP again• obtained CONFINED

Result: ADSafe DOM API safely confines DOM objects under the SES threat model, assuming the annotations hold

Summary

•General discussion of tools− Fundamental limitations− Basic method based on abstract states

•More details on one specific method− Property checkers from Engler et al., Coverity− Sample security-related results

•Quick tour of JavaScript tool− JavaScript confinement verifier− Proves isolation; not “just” bug finding

Slides from: S. Bugrahe, A. Chou, I&T Dillig, D. Engler, A. Taly, …

Documents

Program Analysis for Security John Mitchell CS 155 Spring 2012