Upload
shina
View
39
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Mining Specifications (lots of) code specifications of correctness. Glenn Ammons Ras Bodík Jim Larus Univ. of Wisconsin Univ. of Wisconsin Microsoft Research. program. program. program. program. verifier. Specifications. Bugs!. - PowerPoint PPT Presentation
Citation preview
Glenn Ammons Ras Bodík Jim Larus
Univ. of Wisconsin Univ. of Wisconsin Microsoft Research
Mining Specifications
(lots of) code specifications of correctness
2
Motivation: why specifications?
Verification tools• find bugs early• make guarantees• scale with
programs• need specifications
verifier
Bugs!
program
Specifications
programprogramprogram
3
Language-usage specifications
verifier
Bugs!
program
•array accesses•memory allocation•type safety•...
programprogramprogram
Easy to write,big payoff
4
Library-usage specifications
verifier
program
•cut-and-paste (X11)•network server (socket API)•device drivers (kernel API)•...
programHarder to write,smaller payoff
Bugs!
5
Program specifications
verifier
program
•symbol table well-formed•IR well-formed•...
Hardest to write,smallest payoff
Bugs!
6
Solution: specification mining
Specification mining gleans specifications from artifacts of program development:
• From programs (static)?• From executions of test cases (dynamic)?• From other artifacts?
7
Mining from traces
Advantages:• No infeasible paths• Pointer/alias analysis is easy• Few bugs, as program passes its tests• Common behavior is correct behavior
...socket(domain = 2, type = 1, proto = 0, return = 7)accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8)write(so = 8, buf = 0x100, len = 23, return = 23)read(so = 8, buf = 0x100, len = 12, return = 12)close(so = 8, return = 0)close(so = 7, return = 0)...
8
Output: a specification
socket(return = X)
accept(so = X, return = Y)
close(so = Y)close(so = X)
read(so = Y)
write(so = Y)
Specification says what programs should do:•Temporal dependences (accept follows socket)•Data dependences (accept input is socket output)
start
end
9
How we mine specifications
extract scenarios
standardizePFSA learner
...socket(domain = 2, type = 1, proto = 0, return = 7))...
ACEGB
ACEGB
ACEGB
...socket(domain = 2, type = 1, proto = 0, return = 7))...
...socket(domain = 2, type = 1, proto = 0, return = 7))...
socket(...)
accept(...)
read(...) write(...)
close(...)
socket(...)
accept(...)
read(...) write(...)
close(...)
socket(...)
accept(...)
read(...) write(...)
close(...)
Traces Scenarios(dep. graphs)
Strings
postprocessSpecification
PFSAsocket(return = X)
accept(so = X, return = Y)
close(so = Y)close(so = X)
read(so = Y)
write(so = Y)
start
end
..
A
B
EF
C
D
start
end
. .1010
10
10
10
20
20
20
20
40
10
Outline of the talk
• The specification mining problem
• Our specification mining system• Annotating traces with dependences
• Extracting and standardizing scenarios
• Probabilistic learning and postprocessing
• Experimental results• Related work
11
An impossible problem
C (all correct traces)
T (training traces)
Find a Turing machine thatgenerates C, given T.I (all traces)
Unsolvable:• No restrictions on C• No connection between C and T • Simple variants are also undecidable [Gold67]
12
A simpler problem
Find a PFSA that generatesan approximation of P.
0
1P a probabilitydistribution
Pro
bab
ilit
y
CorrectNoise
13
A simpler problem
Find a PFSA that generatesan approximation of P.
All scenarios0
1P a probabilitydistribution overall scenarios
Pro
bab
ilit
y
Correct scenarios Noise
14
A simpler problem
Find a PFSA that generatesan approximation of P.
Tractable, plus• Scenarios are small• Noise handled• Finite-state• Weights useful for postprocessing
All scenarios0
1P a probabilitydistribution overall scenarios
Pro
bab
ilit
y
Correct scenarios Noise
15
Outline of the talk
• The specification mining problem• Our specification mining system
• Annotating traces with dependences
• Extracting and standardizing scenarios
• Probabilistic learning and postprocessing
• Verifying traces• Experimental results• Related work
16
Dependence annotation
socket(domain = 2, type = 1, proto = 0, return = 7)accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8)write(so = 8, buf = 0x100, len = 23, return = 23)close(so = 8, return = 0)close(so = 7, return = 0)
dependence annotatorTraces
Annotated traces
17
Dependence annotation
Definers:• socket.return• accept.return• close.so
Users:• accept.so• read.so• write.so• close.so
dependence annotatorTraces
Annotated traces
socket(domain = 2, type = 1, proto = 0, return = 7)accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8)write(so = 8, buf = 0x100, len = 23, return = 23)close(so = 8, return = 0)close(so = 7, return = 0)
18
Dependence annotationdependence annotatorTraces
Annotated traces
Definers:• socket.return• accept.return• close.so
Users:• accept.so• read.so• write.so• close.so
socket(domain = 2, type = 1, proto = 0, return = 7)accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8)write(so = 8, buf = 0x100, len = 23, return = 23)close(so = 8, return = 0)close(so = 7, return = 0)
19
Outline of the talk
• The specification mining problem
• Our specification mining system• Annotating traces with dependences
• Extracting and standardizing scenarios
• Probabilistic learning and postprocessing
• Experimental results• Related work
20
Extracting scenariosscenario extractor
Annotatedtraces
Seeds
Abstract scenarios
socket(domain = 2, type = 1, proto = 0, return = 7)accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8)write(so = 8, buf = 0x100, len = 23, return = 23)close(so = 8, return = 0)close(so = 7, return = 0)
21
Extracting scenariosscenario extractor
Annotatedtraces
Seeds
Abstract scenarios
socket(domain = 2, type = 1, proto = 0, return = 7)accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8)write(so = 8, buf = 0x100, len = 23, return = 23)close(so = 8, return = 0)close(so = 7, return = 0)
22
Extracting scenariosscenario extractor
Annotatedtraces
Seeds
Abstract scenarios
socket(domain = 2, type = 1, proto = 0, return = 7)accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8)write(so = 8, buf = 0x100, len = 23, return = 23)close(so = 8, return = 0)close(so = 7, return = 0)
23
Simplifying scenariosscenario extractor
Annotatedtraces
Seeds
Abstract scenarios
socket(domain = 2, type = 1, proto = 0, return = 7) [seed]accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8)write(so = 8, buf = 0x100, len = 23, return = 23)close(so = 8, return = 0)close(so = 7, return = 0)
24
Simplifying scenarios
socket(return = 7) [seed]accept(so = 7, return = 8)write(so = 8)close(so = 8)close(so = 7)
Drops attributesnot used independences.
scenario extractorAnnotatedtraces
Seeds
Abstract scenarios
25
Standardizing scenarios
Simplified scenarios
Equivalentscenarios
Abstractscenarios
Standardization
Two transformations:•Naming: foo(val = 7) foo(val = X)•Reordering: foo(); bar(); bar(); foo();
Finds the least standardized scenario, inlexicographic order
scenario extractorAnnotatedtraces
Seeds
Abstract scenarios
26
Standardizing scenariosscenario extractor
Annotatedtraces
Seeds
Abstract scenarios
socket(return = 7) [seed]accept(so = 7, return = 8)write(so = 8)read(so = 8)close(so = 8)close(so = 7)
Use-def and def-defdependences
27
Standardizing scenarios
Reorder
scenario extractorAnnotatedtraces
Seeds
Abstract scenarios
socket(return = 7) [seed]accept(so = 7, return = 8)read(so = 8)write(so = 8)close(so = 8)close(so = 7)
Use-def and def-defdependences
28
Standardizing scenarios
ReorderName
scenario extractorAnnotatedtraces
Seeds
Abstract scenarios
socket(return = X) [seed]accept(so = X, return = Y)read(so = Y)write(so = Y)close(so = Y)close(so = X)
Use-def and def-defdependences
29
Standardizing scenarios
AB
DEFG
Each interaction is a letter to the PFSA learner.
scenario extractorAnnotatedtraces
Seeds
Abstract scenarios
socket(return = X) [seed]accept(so = X, return = Y)read(so = Y)write(so = Y)close(so = Y)close(so = X)
30
Outline of the talk
• The specification mining problem
• Our specification mining system• Annotating traces with dependences
• Extracting and standardizing scenarios
• Probabilistic learning and postprocessing
• Experimental results• Related work
31
PFSA learning
Algorithm due to Raman et al.:1. Build a weighted retrieval tree2. Merge similar states
automaton learnerAbstractscenarios
Specification
32
PFSA learning
Algorithm due to Raman et al.:1. Build a weighted retrieval tree2. Merge similar states
automaton learnerAbstractscenarios
Specification
A
B
C
EF D
F
100
99
100
99
1
G G
1
99
99
33
PFSA learning
B
C
ED
F
100
99
100
99
1
A
automaton learnerAbstractscenarios
Specification
Algorithm due to Raman et al.:1. Build a weighted retrieval tree2. Merge similar states
G1
G99
34
PFSA learning
B
C
ED
F
100
99
100
99
1
A
automaton learnerAbstractscenarios
Specification
Algorithm due to Raman et al.:1. Build a weighted retrieval tree2. Merge similar states
G100
35
Postprocessing: coring
B
C
ED
F
100
99
100
99
1
A
automaton learnerAbstractscenarios
Specification
1. Remove infrequent transitions2. Convert PFSA to NFA
G100
36
Postprocessing: coring
B
C
ED
F
A
automaton learnerAbstractscenarios
Specification
1. Remove infrequent transitions2. Convert PFSA to NFA
G
37
Outline of the talk
• The specification mining problem
• Our specification mining system• Annotating traces with dependences
• Extracting and standardizing scenarios
• Probabilistic learning and postprocessing
• Experimental results• Related work
38
Where to find bugs?
• in programs (static verification)?
• or in traces (dynamic verification)?
39
How we verify specifications
extract scenarios
standardizeCheck automaton
membership
...socket(domain = 2, type = 1, proto = 0, return = 7))...
ACEGB
ACEGB
ACEGB
...socket(domain = 2, type = 1, proto = 0, return = 7))...
...socket(domain = 2, type = 1, proto = 0, return = 7))...
socket(...)
accept(...)
read(...) write(...)
close(...)
socket(...)
accept(...)
read(...) write(...)
close(...)
socket(...)
accept(...)
read(...) write(...)
close(...)
Traces Scenarios(dep. graphs)
Strings
40
Verifying traces
...socket(return = 7)accept(so = 7, return = 8)write(so = 8)read(so = 8)close(so = 8)close(so = 7)...
...socket(return = 7)accept(so = 7, return = 8)write(so = 8)read(so = 8)close(so = 8)...
OK (both sockets closed) Bug! (socket 7 not closed)
socket(return = X) [seed]
accept(so = X, return = Y)
close(fd = Y)close(fd = X)
read(so = Y)
write(so = Y)
41
Attempted to mine and verify two published X11 rules
Experimental results
Challenge: small, buggy training sets (16 programs)
42
Learning by trial and error
Start with a rule learned from one, trusted trace.Then:
Randomly select an unused trace
Trace obeys rule?
Add trace to trainingset; learn a new rule
Expert: is trace buggy?
yes
no
no (rule too specific)
Report bug
yes
43
1. A timestamp-passing rule• 4 traces did not need inspection• learned the rule! (compact: 7 states)• bugs in 2 out of 16 programs (ups, e93)• English specification was incomplete (3 traces)• expert and corer agreed on 81% of the hot core
2. SetOwner(x) must be followed by GetSelection(x)
• failed to learn the rule (very small learning set) but
• bugs in 2 out of 5 programs (xemacs, ups)
Results
44
Outline of the talk
• The specification mining problem
• Our specification mining system• Annotating traces with dependences
• Extracting and standardizing scenarios
• Probabilistic learning and postprocessing
• Experimental results• Related work
45
Related workArithmetic pre/post conditions
• Daikon [Ernst et al], Houdini [Flanagan and Leino]• properties orthogonal from us • eventually, we may need to include and learn some
arithmetic relationships
Temporal relationships over calls • intrusion detection: [Ghosh et al], [Wagner and Dean]
• software processes: [Cook and Wolf]
• error checking: [Engler et al SOSP 2001]• lexical and syntactic pattern matching • user must write templates (e.g., <a> always follows
<b>)
• design patterns: [Reiss and Renieris]
46
Conclusion
• Introduced specification mining, a new approach for learning correctness specifications
• Refined the problem into a problem of probabilistic learning from traces
• Developed and demonstrated a practical specifications miner
47
End of talk
48
How we mine specifications
tracer rundependence annotator
Program
Instrumentedprogram
Test inputs
Traces Annotatedtraces
...socket(domain = 2, type = 1, proto = 0, return:T0 = 7)[SETUP socket:T0 7]accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T0 = 8)[USE socket:T0 8]close(so:T0 = 8, return = 0)close(so:T0 = 7, return = 0)...
49
How we mine specifications
Program
int s = socket(AF_INET, SOCK_STREAM, 0); [DO SETUP]while(cond1) { int ns = accept(s, &addr, &len); while(cond2) { [USE NS] if (cond3) return; } close(ns); }close(s);
50
How we mine specifications
tracer
Program
Instrumentedprogram
int s = socket(AF_INET, SOCK_STREAM, 0); [DO SETUP]while(cond1) { int ns = accept(s, &addr, &len); while(cond2) { [USE NS] if (cond3) return; } close(ns); }close(s);
51
How we mine specifications
tracer run
Program
Instrumentedprogram
Test inputs
Traces
...socket(domain = 2, type = 1, proto = 0, return = 7)[SETUP socket 7]accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8)[USE socket 8]close(so = 8, return = 0)close(so = 7, return = 0)...
52
How we mine specifications
tracer rundependence annotator
Program
Instrumentedprogram
Test inputs
Traces Annotatedtraces
...socket(domain = 2, type = 1, proto = 0, return = 7)[SETUP socket 7]accept(so = 7, addr = 0x40, addr_len = 0x50, return = 8)[USE socket 8]close(so = 8, return = 0)close(so = 7, return = 0)...
53
How we mine specifications
tracer runscenario
extractordependence annotator
Program
Instrumentedprogram
Test inputs
Traces Annotatedtraces
Scenarioseeds
Abstractscenarios
socket(return = X) [seed][SETUP socket X]accept(so = X, return = Y)[USE socket Y]close(so = Y)close(so = X)
54
How we mine specifications
tracer runscenario
extractorautomaton
learnerdependence annotator
Program
Instrumentedprogram
Test inputs
Traces Annotatedtraces
Scenarioseeds
Abstractscenarios
Specification
socket(return = X) [seed]
[SETUP X]
accept(so = X, return = Y)
close(fd = Y)close(fd = X)
[USE Y]
55
Reducing the problem
C (all correct traces)
T (training traces)
The problem: find anautomaton that generatesC, given T.
I (all traces)
Issues:•What if C is not r.e.?•Checkers and learnersneed finite specs.
56
Reducing the problem
C (all correct traces)
T (training traces)
The problem: find anautomaton that generatesC, given T.
I (all traces)
Issues:•What if C is not r.e.?•Checkers and learnersneed finite specs.
57
Reducing the problem
The problem: find anautomaton that generatesC, given T. Assume thatC is regular.
Issue:•What if the program isnot regular?
C (all correct traces, regular)
T (training traces)
I (all traces)
I
C
T
Unrestricted
58
Reducing the problem
The problem: find anautomaton that generatesCS, given TS. Assume thatthe size of scenarios isbounded.
Issue:•No connectionbetween CS and TS!
CS (all correct scenarios, regular)
TS (training scenarios)
IS (all scenarios, bounded size)
I
C
T
Unrestricted RegularI
C
T
59
Reducing the problem
The problem: find anautomaton that generatesCS, given TS. Assume thatTS presents each element ofCS at least once.
Issue:•Undecidable (Gold67)
CS (all correct scenarios, regular)
TS = c0, c1, ...
IS (all scenarios, bounded size)
I
C
T
Unrestricted RegularI
C
T
IS
CS
TS
Scenarios
60
Reducing the problem
The problem: find a PFSAthat generates P’, whereP and P’ are close (by somedistance metric). AssumeP is generated by a PFSA.
I
C
T
Unrestricted RegularI
C
T
ScenariosIS
CS
Completepresentation
IS (all scenarios)
TS = c0, c1, ...
IS
CS
TS
0
1P a probabilitydistribution over IS,generated by a PFSA
61
Digression: postprocessing
• PFSA = NFA with weights• Specification = NFA• Convert PFSA to specification:
1. Find hot core (that is, drop noise)• drop infrequent scenarios• drop infrequent parts of scenarios
2. Drop weights
62
Preparing input traces
socket(domain = 2, type = 1, proto = 0, return:T0 = 7)[SETUP socket:T1 7]accept(so:T2 = 7, addr = 0x40, addr_len = 0x50, return:T3 = 8)[USE socket:T4 8]close(so:T5 = 8, return = 0)close(so:T5 = 7, return = 0)
63
Preparing input traces
socket(domain = 2, type = 1, proto = 0, return:T0 = 7)[SETUP socket:T0 7]accept(so:T2 = 7, addr = 0x40, addr_len = 0x50, return:T3 = 8)[USE socket:T4 8]close(so:T5 = 8, return = 0)close(so:T5 = 7, return = 0)
64
Preparing input traces
socket(domain = 2, type = 1, proto = 0, return:T0 = 7)[SETUP socket:T0 7]accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T3 = 8)[USE socket:T4 8]close(so:T5 = 8, return = 0)close(so:T5 = 7, return = 0)
65
Preparing input traces
socket(domain = 2, type = 1, proto = 0, return:T0 = 7)[SETUP socket:T0 7]accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T3 = 8)[USE socket:T4 8]close(so:T0 = 8, return = 0)close(so:T0 = 7, return = 0)
66
Preparing input traces
socket(domain = 2, type = 1, proto = 0, return:T0 = 7)[SETUP socket:T0 7]accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T3 = 8)[USE socket:T3 8]close(so:T0 = 8, return = 0)close(so:T0 = 7, return = 0)
67
Preparing input traces
socket(domain = 2, type = 1, proto = 0, return:T0 = 7)[SETUP socket:T0 7]accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T0 = 8)[USE socket:T0 8]close(so:T0 = 8, return = 0)close(so:T0 = 7, return = 0)
68
Extracting scenarios
socket(domain = 2, type = 1, proto = 0, return:T0 = 7)[SETUP socket:T0 7]accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T0 = 8)[USE socket:T0 8]close(so:T0 = 8, return = 0)close(so:T0 = 7, return = 0)
scenario extractorAnnotatedtraces
Seeds
Abstract scenarios
69
Extracting scenarios
socket(domain = 2, type = 1, proto = 0, return:T0 = 7)[SETUP socket:T0 7]accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T0 = 8)[USE socket:T0 8]close(so:T0 = 8, return = 0)close(so:T0 = 7, return = 0)
scenario extractorAnnotatedtraces
Seeds
Abstract scenarios
70
Extracting scenarios
socket(domain = 2, type = 1, proto = 0, return:T0 = 7)[SETUP socket:T0 7]accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T0 = 8)[USE socket:T0 8]close(so:T0 = 8, return = 0)close(so:T0 = 7, return = 0)
scenario extractorAnnotatedtraces
Seeds
Abstract scenarios
71
Simplifying scenarios
socket(domain = 2, type = 1, proto = 0, return:T0 = 7) [seed][SETUP socket:T0 7]accept(so:T0 = 7, addr = 0x40, addr_len = 0x50, return:T0 = 8)[USE socket:T0 8]close(so:T0 = 8, return = 0)close(so:T0 = 7, return = 0)
scenario extractorAnnotatedtraces
Seeds
Abstract scenarios
72
Simplifying scenarios
socket(return:T0 = 7) [seed][SETUP socket:T0 7]accept(so:T0 = 7, return:T0 = 8)[USE socket:T0 8]close(so:T0 = 8)close(so:T0 = 7)
Drop untypedattributes.
scenario extractorAnnotatedtraces
Seeds
Abstract scenarios
73
Standardizing scenarios
Standardization puts equivalent scenarios into a canonicalabstract form:
Simplified scenarios
Equivalentscenarios
Abstractscenarios
Standardization
A search using two transformations:•Naming: foo(val = 7) foo(val = X)•Reordering: foo(); bar(); bar(); foo();
scenario extractorAnnotatedtraces
Seeds
Abstract scenarios
74
Standardizing scenarios
socket(return:T0 = 7) [seed][SETUP socket:T0 7]accept(so:T0 = 7, return:T0 = 8)[USE Y]close(so:T0 = 8)close(so:T0 = 7)
scenario extractorAnnotatedtraces
Seeds
Abstract scenarios
75
Standardizing scenarios
socket(return:T0 = 7) [seed][SETUP socket:T0 7]accept(so:T0 = 7, return:T0 = 8)write(so:T0 = 8)read(so:T0 = 8)close(so:T0 = 8)close(so:T0 = 7)
scenario extractorAnnotatedtraces
Seeds
Abstract scenarios
76
Standardizing scenarios
socket(return:T0 = 7) [seed][SETUP socket:T0 7]accept(so:T0 = 7, return:T0 = 8)read(so:T0 = 8)write(so:T0 = 8)close(so:T0 = 8)close(so:T0 = 7)
Reorder
scenario extractorAnnotatedtraces
Seeds
Abstract scenarios
77
Standardizing scenarios
socket(return:T0 = X) [seed][SETUP socket:T0 X]accept(so:T0 = X, return:T0 = Y)read(so:T0 = Y)write(so:T0 = Y)close(so:T0 = Y)close(so:T0 = X)
ReorderName
scenario extractorAnnotatedtraces
Seeds
Abstract scenarios
78
Standardizing scenarios
socket(return:T0 = X) [seed][SETUP socket:T0 X]accept(so:T0 = X, return:T0 = Y)read(so:T0 = Y)write(so:T0 = Y)close(so:T0 = Y)close(so:T0 = X)
ABC
DEFG
Each interaction is a letter to the PFSA learner.
scenario extractorAnnotatedtraces
Seeds
Abstract scenarios
79
Coring
Coring removes PFSA transitions that occur infrequentlyand converts the PFSA into an NFA.
[SETUP X]
accept(so = X, return = Y)
close(fd = Y)[USE Y]
close(fd = X)
socket(return = X) [seed]
automaton learnerAbstractscenarios
Specification
80
Verification
Do all traces of a program P satisfy a specification A?
81
Verification
Do all traces of a program P satisfy a specification A?Does a trace T
Definition: T satisfies A if every seed in T is surroundedby a scenario that satisfies A.
82
Verification
Do all traces of a program P satisfy a specification A?Does a trace TDoes a scenario S
Language of A
Abstract scenariossatisfying A
Simplified scenariossatisfying A
Concrete scenariossatisfying A
SimplificationStandardization
S?
83
Experiments
• What we wanted to find out• Hypothesis 1: the process will find bugs and
reduce the number of traces that the expert must inspect.
• Hypothesis 2: the miner’s final specification will match the English rule.
• Hypothesis 3: the corer and the human will agree on the hot core.
• Gathered traces from 16 programs:• 5 programs in the X11 distribution and
• 11 contributed programs
84
Testing vs. verification
testing:
programinputinputinput is the output correct?
inputinputinput
propertyproperty
verification:
checkerproperty does property hold?
programX11sockets
sample properties:• allocated memory is freed.• locks are released.• …
85
Testing vs. verification
Completeness (“coverage”):• verification (if sound) guarantees that program
contains no bugs of a well-specified class.
testing verification
aspects all some
control some all
data some all
our focus
86
Verification: recent successes
Recent successes. specifications languages: temporal logics, automata, … abstractors: SLAM, FeaVer checkers: model checking, theorem proving, type
systems
What’s still missing?? specifications
property holds?
program
checker
abstractprogram
L1
abstractor
formal specificationof correctness
L2
property
87
So who formulates specifications?
Programmers? Probably not.
Why they won’t: • too busy: yet another language to learn?• specifications aren’t cool.• specification languages are hard: LTL, anyone?
Why they shouldn’t:• may misunderstand usage rules.• may not know all usage rules.
Mining Specifications: Convenient and easy: anyone can do it Like in data mining, discover surprise rules.
88
Advantages of mining
Exploits the massive programmers’ effort reflected in the code.
• Programmers resolved many problems:
• incomplete system requirements.
• incomplete API documentation.
• implementation-dependent API rules.• Want redundancy? (without redundant programming)
• ask multiple programmers (and vote).
Exploits the testers’ effort in devising test inputs
89
Our output: a specification
x = socket()
bind(x)
listen(x)
y = accept(x)
write(y)
close(y)
close(x)
read(y)
90
How do we mine?
Underlying premise:
Even bad software is debugged enough to show hints of correct behavior.
Maxim: Common usage is the correct usage.
91
Mining = machine learningReduce the problem into the well-known
problem of learning regular languages.
Obstacles:1. source code is too detailed and hard to analyze2. what is “common” behavior?
Solutions:
1. learn from dynamic behavior
2. learn probabilistically
learn from traces into probabilistic FSMs
92
Input: trace(s)7 = socket(2, 1, 0);bind(7, 0x400120, 16);listen(7, 5);8 = accept(7, 0x400200, 0x400240);read(8, 0x400320, 255);write(8, 0x400320, 12);read(8, 0x400320, 255);write(8, 0x400320, 7);close(8);10 = accept(7, 0x400200, 0x400240);read(10, 0x400320, 255);write(10, 0x400320, 13);close(10);close(7);……
x = socket()
bind(x)
listen(x)
y = accept(x)
write(y)
close(y)
close(x)
read(y)
7 = socket(2, 1, 0);bind(7, 0x400120, 16);listen(7, 5);8 = accept(7, 0x400200, 0x400240);read(8, 0x400320, 255);write(8, 0x400320, 12);read(8, 0x400320, 255);write(8, 0x400320, 7);close(8);10 = accept(7, 0x400200, 0x400240);read(10, 0x400320, 255);write(10, 0x400320, 13);close(10);close(7);……
7 = socket(2, 1, 0);bind(7, 0x400120, 16);listen(7, 5);8 = accept(7, 0x400200, 0x400240);read(8, 0x400320, 255);write(8, 0x400320, 12);read(8, 0x400320, 255);write(8, 0x400320, 7);close(8);10 = accept(7, 0x400200, 0x400240);read(10, 0x400320, 255);write(10, 0x400320, 13);close(10);close(7);……
93
The mining algorithm
dynamicexecution
(traces)
trace abstraction
usage scenarios
(strings)
(off-the-shelf)
RegExp learner
generalizedscenarios
(probabilistic FSA)
user: extract heavy core(and approve)
specification(NFA)
dynamic checker
dynamic exe.to be checked
(trace)
OK/bug
94
Trace abstraction: 4 challenges• Traces interleave useful and useless events.
• sockets created by accept are independent, …
• Specifications must include both temporal and value-flow constraints.
• Only some of API calls’ arguments impose “true” dependences.• accept does not alter the state of the bound socket,
…
• Specifications may impose only partial order.• filling in fields of a structure before a call, …
95
Finding dependendences7 = socket(2, 1, 0);bind(7, 0x400120, 16);listen(7, 5);8 = accept(7, 0x400200, 0x400240);read(8, 0x400320, 255);write(8, 0x400320, 12);read(8, 0x400320, 255);write(8, 0x400320, 7);close(8);10 = accept(7, 0x400200, 0x400240);read(10, 0x400320, 255);write(10, 0x400320, 13);close(10);close(7);……
Some args and return valuesare handles to data structures.Calls may
•write through the handle•read through the handle•read and write
Def-use dependences connectwriters to readers
h(_, )
a( , )d( , )b(_, )
e( )
Trace abstraction
h(3, 5) c(10)a(4, 5)d(4, 7)b(0, 5)f(10)h(8, 11)e(7)f(50)d(15, 1) c(7)a(9, 11)b(6, 7)d(9, 14)f(20)e(7)…
h(_, X)
a(Y, X)b(_, X)d(Y, Z)
e(Z)
h(_, X) a(Y,
X)b(_, X)d(Y,
Z)e(Z)
h(_, 5) c(10)a(4, 5)d(4, 7)b(_, 5)f(10)h(_, 11)e(7)f(_)d(_, _) c(7)a(9, 11)b(_, 11)d(9, _)e(_)f(_)…
h(_, X)
a(Y, X)d(Y, Z)b(_, X)
e(Z)
h(_, X) a(Y,
X)b(_, X)d(Y,
Z)
97
The output PFSA
h(_, X) a(Y, X) b(_, X) d(Y, Z) e(Z)
2 2 2 1 1
d(Y, Z)
1
98
Renaming and reordering the chop
outline of the algorithm
input: a chop (a dag of data dependences)output: the canonical chop
1. reorder: list all possible chop schedules• trick: only list those with calls in lexicographic order
2. rename: abstract arguments in each schedule3. select lexicographically least schedule
lexicographic order:a(…) b(…) < b(…) b(…)a(X) b(…) < a(Y) b(…)
99
Checking: the meaning of the spec
means:whenever seed(x) is executed, it must be preceded by a(x), b(x) and followed by c(x).
does not mean:a(x) must be followed by b(x), seed(x), c(x) (because a is not a seed).
seed(x) c(x)b(x)a(x)
100
Dynamic checking
• Used in our experiments
• checker mirrors the learner:
specification(NFA)
dynamic checker
for each seed in the trace extract a chop if some substring from chop in NFA
seed verified! else
extract a larger chop(up to a bound)
fail if no chop verifies
dynamic executionto be checked
(trace)
OK/bug
101
Static checking
Conversion to a “checkable” specification:
seed(x) c(x)b(x)a(x)
seed(x)
c(x)b(x)a(x)
^b(x)
^seed(x)
OK
bug!
^c(x) | end
seed(x)
102
Related workArithmetic pre/post conditions
• Daikon, Houdini• properties orthogonal from us • eventually, we may need to include and learn some
arithmetic relationships
Temporal relationships over calls • intrusion detection: [Ghosh et al], [Wagner and Dean]
• software processes: [Cook and Wolf]
• error checking: [Engler et al SOSP 2001]• lexical and syntactic pattern matching • user must write templates (e.g., <a> always follows
<b>)
106
Summary• Semi-automatically formulating well-formed,
non-trivial specifications is an important part of the verification tool chain.
• Contributions:• introduced specifications mining
• phrased it as probabilistic learning from dynamic traces
• decomposed it into a sequence of subproblems (using an off-the-shelf learner)
• developed dynamic checker
• found bugs
107
The supply/demand pyramids
LTL
C
C++
Java
Visual Basic
javascript, html, XML
skill(supply)
effort(demand)
s/w development
requirements
analysis
verification and testing