Software Design Rules Monica Lam Stanford ...lam/hm.pdf · Software Design Rules Monica LamMonica...

Preview:

Citation preview

Software Design RulesSoftware Design RulesSoftware Design Rules

Monica LamMonica LamMonica Lam

Stanford UniversityStanford UniversityStanford University

Joint work with: Sudheendra Hangal, David Heine, Ben Livshits, Michael Martin, John Whaley

1

Software is Full of Errors

Error rate: 1-4.5 errors per 1000 linesWindows 2000

35M LOC, 63000 known bugs at time of release2 per 1000 lines

Large consumer softwareFormal specification & verification infeasible

2

Buffer Overruns

A buffer access must stay within boundsBuffer overruns are responsible for over 50% of major vulnerabilitiesMS Blaster, Slammer, Code red, nimda, …, Internet Worm, 1988Responsible for damages in billions

3

Memory Leaks

All unused memory should be freed once and only onceMemory exhaustion may cause long running programs to fail.

4

SQL Injection Errors

Database

Web serverFront end

user “Give me Bob’s credit card #”or

“Delete all records”

User may not supply SQL queries to databases directlyOne of top ten vulnerabilities

5

Happy-go-lucky SQL Query

User supplies: name, password

SELECT UserID, CreditcardFROM RecordsWHERE

Name = ‘ + name+’ AND PW =‘ + password + ’

6

Fun with SQL

“ — ”: means “the rest are comments” in SQLSELECT UserID, CreditCard

FROM RecordsWHERE:

Name = ‘bob ’ AND PW = ‘foo’Name = ‘bob’— ’ AND PW = ‘x’Name = ‘bob’ or 1=1—’ AND PW = ‘x’Name = ‘bob’; DROP Records—’ AND PW = ‘x’

7

Design Rules

Buffer overruns, memory leaks, SQL injections

Same error is often repeated many times

May be specific to a language, a class of applications, a programImportant: may be critical to securitySuccinct: governs many lines of code

8

General Practice

Design rules are implicitViolations of design rules rampant Tools: purify,

grep, emacs, program environments

9

Leverage Computer Power

Courtesy: Intel

10

Programs Don’t Grow Exponentially!

01020304050

1990 1995 2000 2005

Size of Microsoft Windows

Mill

ion

Line

s of

Cod

e

NT3.195

98NT5.0

2000XP

11

This Talk

New generation of“Computer-Aided Programming Tools”

to enforce critical software design rules

12

Custom Design Rule Checkers

Intrinsa, Coverity: C checkersBuilt-in rules for operating systemsRelatively simple analysis UnsoundFound thousands of critical errors in Windows, Linux, BSD, …

13

New-Generation Tools

Design Rules

AdvancedStatic

Analysis

DynamicDetection

& Recovery

User-Supplied, Application-

Specific

AutomaticExtraction

14

PQL: PQL: a Program Query Languagea Program Query Language

15

PQL: a Program Query Language

Design Rules

AdvancedStatic

Analysis

DynamicDetection

& Recovery

User-Supplied, Application-

Specific

AutomaticExtraction

16

PQL Query

Pattern: Illegal sequences of operations on related objectsLooks like the simplest code excerptwith pattern

Action:Print out result, halt program, or recover

17

SQL Injectionx = req.getParameter();stmt.executeQuery (x);

o1 = req.getParameter(); formal = o1;o2.f = formal;o3.g = o2.f;stmt.executeQuery(o3.g);

getParameter

executeQuery

x

PQL:

18

Basic Question

p1 and p2 point to the same object?Undecidable!

getParameter executeQueryx

p1 p2

19

Dynamic Checker

p1 and p2 point to the same object?Instrument Java byte codesgetParameter: record ID of p1’s pointeeexecuteQuery:

check if p2’s pointee has been recorded

getParameter executeQueryx

p1 p2

20

Static Checker

p1 and p2 point to same object?Pointer alias analysis

getParameter executeQueryx

p1 p2

21

Pointer Alias AnalysisPointer Alias Analysis

[Whaley & Lam, PLDI 2004]

22

Pointer Analysis

h1: v1 = new Object();h2: v2 = new Object();

v1.f = v2;v3 = v1.f;

Input RelationsvPointsTo(v1,h1)vPointsTo(v2,h2)Store(v1,f,v2)Load(v1,f,v3)

Output RelationshPointsTo(h1,f,h2)vPointsTo(v3,h2)

v1 h1

v2 h2

fv3

23

hPointsTo(h1, f, h2) :- Store(v1, f, v2),vPointsTo(v1, h1),vPointsTo(v2, h2).

v1 h1

v2 h2

f

Inference Rule in Datalog

v1.f = v2;

Stores:

24

Pointer Alias Analysis

Specified by 5 Datalog rulesCreation sitesAssignmentsStoresLoadsType filter

Apply rules until they converge

25

Method Invocations

Context insensitive is impreciseUnrealizable paths

Object id(Object x) {return x;

}

a = id(b); c = id(d);

26

Object id(Object x) {return x;

}

Object id(Object x) {return x;

}

Context Sensitivity

Context sensitivity is important for precision.Conceptually give each caller its own copy.

a = id(b); c = id(d);

27

Cloning-Based Analysis

Simple brute force technique.Clone every path through the call graph.Run context-insensitive algorithm on expanded call graph.

The catch: exponential blowup

28

Cloning is exponential!

29

Recursion

Actually, cloning is unbounded in the presence of recursive cycles.Technique: We treat all methods within a strongly-connected component as a single node.

30

Recursion

A

G

B C D

E F

A

G

B C D

E F E F E F

G G

31

Top 20 Sourceforge Java AppsNumber of Clones

1.E+001.E+021.E+041.E+061.E+081.E+101.E+121.E+141.E+16

1000 10000 100000 1000000Size of program (variable nodes)

Num

ber o

f clo

nes

1016

1012

108

104

100

32

Cloning is infeasible (?)Typical large program has ~1014 pathsIf you need 1 byte to represent a clone:

256 terabytes of storage> 12 times size of Library of Congress1GB DIMMs: $98.6 million

Power: 96.4 kilowatts (128 homes)300 GB hard disks: 939 x $250 = $234,750

Time to read sequential: 70.8 days

33

BDD comes to the rescue

Many similarities across contexts.Many copies of nearly-identical results.

BDDs (binary decision diagrams)can represent large sets of redundant data efficiently.

34

Static Checker Generation

PQL Query

Datalog

BDD code

2 lines

10 lines

1000 lines

35

BDD: Binary Decision DiagramsBDD: Binary Decision Diagrams

36

Call Graph Relation

Call graph expressed as a relation.Five edges:

Calls(A,B)Calls(A,C)Calls(A,D)Calls(B,D)Calls(C,D)

B

D

C

A

37

Call Graph RelationRelation expressed as a binary function.

A=00, B=01, C=10, D=11

01011

111100000101001001011110100011

001111

1001100x3

0111

0010011000101100100011000000fx4x2x1

B

D

C

A 00

1001

11

38

Binary Decision DiagramsGraphical encoding of a truth table.

x2

x4

x3 x3

x4 x4 x4

0 0 0 1 0 0 0 0

x2

x4

x3 x3

x4 x4 x4

0 1 1 1 0 0 0 1

x1 0 edge1 edge

39

Binary Decision DiagramsCollapse redundant nodes.

x2

x4

x3 x3

x4 x4 x4

0 0 0 0 0 0 0

x2

x4

x3 x3

x4 x4 x4

0 0 0 0

x1

11 1 1 1

40

Binary Decision DiagramsCollapse redundant nodes.

x2

x4

x3 x3

x4 x4 x4

x2

x4

x3 x3

x4 x4 x4

0

x1

1

41

Binary Decision DiagramsCollapse redundant nodes.

x2

x4

x3 x3

x2

x3 x3

x4 x4

0

x1

1

42

Binary Decision DiagramsCollapse redundant nodes.

x2

x4

x3 x3

x2

x3

x4 x4

0

x1

1

43

Binary Decision DiagramsEliminate unnecessary nodes.

x2

x4

x3 x3

x2

x3

x4 x4

0

x1

1

44

Binary Decision DiagramsEliminate unnecessary nodes.

x2

x3

x2

x3

x4

0

x1

1

45

Binary Decision Diagrams

Represent tiny and huge relations compactlySize depends on redundancy

Similar contexts have similar numberingsVariable ordering in the BDD

10 minutes or “runs out of memory”Active machine learning algorithm

46

Expanded Call GraphA

DB C

E

F G

H

0 1 2

A

DB C

E

F G

H

E E

F F GG

H H H H H

0 12

210

47

Numbering ClonesA

DB C

E

F G

H

A

DB C

E

F G

H

E E

F F GG

H H H H H

0 0 0

0 1 2

0-2 0-2

0-2 3-5

0 0 0

0 1 2

0 12

210

0 1 2 3 4 5

0

48

ExperienceContext-sensitive points-to analysis

800K byte codes in less than 20 minutes

PQL suitable for many known error patternsSQL injectionResource leakagePersistence object management

Applied to 6 large programs unknown to usFound 44 critical errors easily

49

DynamicDetection

& Recovery

AutomaticExtraction

DynamicDetection

& Recovery

AdvancedStatic

Analysis

User-Supplied, Application-

Specific

AutomaticExtraction

DynamicDetection

& Recovery

AutomaticExtraction

New-Generation Tools

Design Rules

DynamicDetection

& Recovery

AdvancedStatic

Analysis

User-Supplied, Application-

Specific

AutomaticExtraction

PQL

50

Automatic Design Rule Extraction

Too many design rules to specifyPrinciple: Most of the code is correctInconsistency = Errors

ClouseauClouseau: : a static memory leak detectora static memory leak detector

52

Ownership Model

Owner pointerObligated to delete object, orpass ownership to another pointer

Every object has an owner pointerNo memory leaks, no double deletes

53

Past Experience

Decorate each function parameter with “own” “not-own”Not enforced many mistakes

54

Assignment Statement

a = new int;

b = a;

delete a;

a = new int;

b = a;

delete b;

int

a b

int

a b

b = a own(b) = 0 own(a) = own(a’) + own(b’)

own(b),own(a)

own’(b),own’(a)

ClouseauAutomatically infers function signatures

Based on “new” “delete”Constraint: there is always 1 owning pointer

Inconsistency: errorA 125K-line commercial program

50 lines of user specification on container structure (generic data types)Root-causes 82% of memory leaked dynamicallyFound many additional errors

[Heine&Lam, PLDI 2003]

56

Clouseau

DynamicDetection

& Recovery

AutomaticExtraction

DynamicDetection

& Recovery

AdvancedStatic

Analysis

User-Supplied, Application-

Specific

AutomaticExtraction

DIDUCE

New-Generation Tools

Design Rules

DynamicDetection

& Recovery

AdvancedStatic

Analysis

AutomaticExtraction

57

DIDUCEDIDUCEDIDUCE

Dynamic Invariant Deduction ∪

Checking Engine(deduces, but sometimes incorrectly)

58

Motivation

Difficulty in finding root cause of complex errors

DIDUCE can potentially pinpoint the culprit automatically

59

Example

For line # 1234,

Times 1-1000000: 0 <= X <= 3Time 1000001: X = 0xa3d025ef

Aha! must be an error

60

DIDUCE Design

Monitors every memory access!Learns signature for the data accessed by each bytecode from correct runsReport anomalies observedAnomalies signal cause of errors

61

Experience

Pinpointed line of code causing errorsErrors in imap servers:Change triggers a latent error elsewhereMemory subsystem simulator: otherwise unknown errors

Reported corner cases of interest

62

LessonsDumb but tireless

High overhead but pain-freeUnaffected by programmers’misconception

Finds many serendipitous design rulesNeed to trigger just 1 of the rules[Hangal & Lam, ICSE 02]

63

Summary

Design Rules

AdvancedStatic

Analysis

DynamicDetection

& Recovery

User-Supplied, Application-

Specific

AutomaticExtraction

64

ConclusionsNew tools that exploit computing power to manage software complexityNew advanced analyses

Pointer alias analysis--Binary decision diagrams

Analysis available to programmersPQL: easy to express execution patterns

Automatic design rule extractionInconsistencies errors

Recommended