Upload
ralf-bryant
View
215
Download
1
Embed Size (px)
Citation preview
©2000-2003 Zhichen Xu UC Davis 5/2003
Safety Checking of Machine Code
Zhichen [email protected]
Internet Systems and Storage LaboratoryHP Labs, Palo Alto
(This is the Dissertation Work done at UW-MadisonThesis Advisors: Bart Miller and Tom Reps)
2
Motivation• Dynamic extensibility
– Operating systems: custom policies, general functionality, performance • Extensible OS: exokernel, VINO, SPIN, synthetix... • Commodity OS: SLIC, kerninst, ...
– Databases: type-based extensions • Illustra, informix, paradise, ...
– Web browsers: plug-ins– Performance tools: measurement code
• Kerninst, paradyn, ...
– Other • Dyninst, ...
3
Motivation
• Component-based software (Java, COM)– Software components from different vendors are
combined to construct a complete application– Code from several sources with no mutual trust
Safety is crucial in these domains!
4
Big Picture and Concepts
• Is it safe for untrusted foreign code to be loaded into a trusted host system?
Host
UntrustedCode
Data
Code
Code producer
Code consumer
5
Safety Properties We Enforce
• Default collection of safety conditions– No type violations– No out-of-bounds array accesses, – No misaligned loads/stores, – No uses of uninitialized variables, – No invalid pointer dereferences, – No unsafe interaction with the host
• Precise and flexible host access policy – Customizable
Type safety
6
Outline
• Introduction – Motivation– Related work– High-level characteristics
• Safety-properties and policies• Safety-checking analysis• Experimental evaluation• Conclusions and future research
7
Related Work
• Dynamic Techniques:– Hardware enforced
address spaces, SFI, interpretation, etc.
• Hybrid Techniques– Safe languages: Java, ML,
Modula 3, etc.
Runtime cost
Potential recovery problem
• Static Techniques– Proof-Carrying Code
[Necula, Lee]
– Certifying Compiler, Typed-Assembly Language[Necula, Lee]
[Morrisett, Walker, Crary, Glew, ...][Colby, Lee, Necula, Blau]
8
Related Work vs Our Approach
CertifyingCompiler
Safe C, Java, ML C, C++
Pointer Arithmetic,...
Machine Code
Proof
ProofChecker
Yes/No
Ordinary Binary
Annotated initial inputs
Safety Checker
Yes/No
Off-the-Shelfcc, gcc, as
C, C++, Assembly, ...
Safety Policy,Initial inputs
<<
9
Related Work vs Our Approach
CertifyingCompiler
Safe C, Java, ML C, C++
Pointer Arithmetic,...
Machine Code
Proof
ProofChecker
Yes/No Yes/No
C, C++, Assembly, ...
Ordinary Binary
Proofgenerator
Off-the-Shelfcc, gcc, as
Safety Policy,Initial inputs
OrdinaryBinary
Proof
ProofChecker
10
High-Level Characteristics
• Perform safety checking on ordinary binary, mechanically synthesize (and verify) a safety proof
• Extend the host at a very fine-grained level (allow the untrusted code to manipulate the internal data structures of the host directly)
• Enforce host-specified access policy + type safety
11
Outline
• Introduction• Safety Properties and Policies• Safety-Checking Analysis• Experimental Evaluation• Conclusion and Future Research
12
Safety Properties
• Default collection of safety conditions– No type violations– No out-of-bounds array accesses, – No misaligned loads/stores, – No uses of uninitialized variables, – No invalid pointer dereferences, – No unsafe interaction with the host
(Fine grained memory protection and data abstraction)
• Precise and flexible host access policy – Customizable
13
• Classify locations into regionsAs big as the entire address space, as small as a variable
• [Region : Category : Access]– Category: Types, fields– Access:
readable (r), writable (w), followable (f), executable (e), operable (o)
(e.g., to “copy”, to “examine”)
Host-Specified Access Policy
Locations
Values
14
Protections Provided by Access Policy
Initial inputs to the untrusted code
callTree List
Methods
Host
call
call
rf
rf
rw
rw
rw
rw
rw
call
call
call
x
15
• Kernel page-replacement extension– Pick a cold page from global LRU list.typedef struct _page_list {
int page;...struct _page_list * next;
} page_list;
[Host : page_list.page : ro]
[Host : page_list.next, page_list ptr : rfo]
Principle of “Least Privilege”
// read access
// follow access
16
Outline
• Introduction• Safety Properties and Policies• Safety-Checking Analysis
– An abstract storage model– Safety checking technique– An example
• Experimental Evaluation • Conclusion and Future Research
17
Abstract Storage Model
• Abstract Store : Abstract location Typestate– Abstract locations:
• summarize one or more memory locations • readable, writable, name, size, align, ...
– Typestate : properties of the values• <type, state, access>• Forms a lattice
• Linear Constraints– Linear equalities, linear inequalities– Array bounds checks, alignment checks, etc.e.g., ptr != 0; 0 <=index < Length; addr mod 4=0;
18
Example and Notations
• Abstract Location
• Abstract location typestate– n.page:<page_list.page, i, ro> – n.next: <page_list.next, {n}, rfo>– ptr:<page_list ptr, {n}, rfo>
“i” for initialized, “u” for uninitialized, {n} for reference to n
ptrn1 n2 n3 n4
nptr
19
Safety Checker
Inputs to Our Safety Checker
Ordinary binary
C, C++, Java
cc...
Yes No (why)
Untrusted Trusted
•Access policy: [H: ....page: ro]
init init
•Typestate spec
•Invocation spec (bindings)
%o0
20
Host-Typestate Specification
• Data aspect: – Types and states of host data before the invocation
of untrusted code
• Control aspect – Safety pre- and post- conditions
• Precondition describes the obligations that the actual parameters must meet
• Postcondition provides a guarantee on the resulting state
21
Summarizing Function Calls
• Placeholders: size, access permission, and typestate provide the safety requirements for actual parameters
int gettimeofday (struct timeval* tp);Safety Precondition
%o0: <struct timeval ptr, {null, t}, fo>t: <struct timeval, [u, u], w>
Safety Postconditiont: <struct timeval, [<int, i, o>, <int, i, o>], o>%o0: <int, i, o>other registers:
22
Checking Function Calls
• A binding process:– Check whether the actual abstract locations can
meet the obligations specified by placeholders
• An update process:– Updates the typestates of all actual locations that
are represented by the placeholders according to the safety postcondition
23
Safety Checking: Phases
2: TypestatePropagation
3: Annotation
4: Local Verification
5: Global Verification
1: Preparation
24
Safety Checking: Phases
2: TypestatePropagation
3: Annotation
4: Local Verification
5: Global Verification
1: Preparation
What does each instruction do?
Is each instruction safe?
25
Safety Checking: Phases
2: TypestatePropagation
3: Annotation
4: Local Verification
5: Global Verification
1: Preparation
1: Preparation
Untrusted Code
Access policyHost typestate
Invocation
1.Initial Annotation:
AbsLoc->typestate, Linear Constraints
26
Safety Checking: Phases
2: TypestatePropagation
3: Annotation
4: Local Verification
5: Global Verification
1: Preparation
2: Typestate Propagation
Untrusted Code
1.Initial
Annotation
2.Approximation of
memory contents ateach program point
27
Safety Checking: Phases
2: TypestatePropagation
3: Annotation
4: Local Verification
5: Global Verification
1: Preparation
3. Annotation
Untrusted Code
2.Approximation
of memory
3. Local Safety Conditions
3. Global Safety
Conditions
3. Facts
28
Safety Checking: Phases
2: TypestatePropagation
3: Annotation
4: Local Verification
5: Global Verification
1: Preparation
4. LocalVerification
2.Approximation
of memory
3. Local Safety Conditions
True/False
29
Safety Checking: Phases
2: TypestatePropagation
3: Annotation
4: Local Verification
5: Global Verification
1: Preparation
5b. Induction Iteration
Untrusted Code
3. Global Safety
Conditions
True/False
3, 5a Facts5a. Range Analysis
30
– Host Typestate: a_p: <int[n], {a}, _>, {n1} a: <int, i, _>
A Bounds Checking Example
3: BGE 11
1: MOV 0,%o2
2: CMP %o2, %o1
4: NOP
5: SLL %o2,2,%g2
6: LD [%o0+%g2],%g3
7: ADD %o2,1,%o2
8: CMP %o2, %o1
9: BL 5
10: ADD %o3, %g3, %o3
11: RET
12: MOV %o3, %o0
a
a_p
– Access Policy[a_p : int[n] : rfo] [a: int : ro]
%o0 %o1 =n– Invocation
%o0 <-- a_p; %o1 <--- n
31
– Host Typestate: a_p: <int[n], {a}, _>, {n1}a: <int, i, _>
Phase 1: Preparation
3: BGE 11
1: MOV 0,%o2
2: CMP %o2, %o1
4: NOP
5: SLL %o2,2,%g2
6: LD [%o0+%g2],%g3
7: ADD %o2,1,%o2
8: CMP %o2, %o1
9: BL 5
10: ADD %o3, %g3, %o3
11: RET
12: MOV %o3, %o0
a
a_p
– Initial Annotationa: <int, i, or>, %o0: <int [n], {a}, rwfo>,
%o1:<int, i, rwo>
{%o1=n n1}
– Access Policy[a_p : int[n] : rfo] [a: int : ro]
%o0 %o1 =n– Invocation
%o0 <-- a_p; %o1 <--- n
32
Phase 2: Typestate Propagation
3: BGE 11
1: MOV 0,%o2
2: CMP %o2, %o1
4: NOP
5: SLL %o2,2,%g2
6: LD [%o0+%g2],%g3
7: ADD %o2,1,%o2
8: CMP %o2, %o1
9: BL 5
10: ADD %o3, %g3, %o3
11: RET
12: MOV %o3, %o0
Finds out typestate of each absLoc at each program pointa: <int, i, ro>, %o0: <int [n], {a}, rwfo>, %o1:<int, i, rw> {%o1
= n n 1}%o0 %o2 %g2
<int[n], {a}, > <type, [4], > < type, [4], ><int[n], {a}, > <int, i, > < type, [4], >
1: MOV 0,%o2
<int[n], {a}, > <int, i, > < type, [4], >
1: MOV 0,%o2
2: CMP %o2, %o1
<int[n], {a}, > <int, i, > < type, [4], >
2: CMP %o2, %o1
3: BGE 11
6: a : <int, i, ro>%o0: <int[n], {a},
rwo> %g2 : <int, i, rwo>
<int[n], {a}, > <int, i, > < type, [4], >
3: BGE 11
4: NOP
<int[n], {a}, > <int, i, > < int, i, rwo >
4: NOP
5: SLL %o2,2,%g2
6: LD [%o0+%g2],%g3
6: a : <int, i, ro>%o0: <int[n], {a},
rwo> %g2 : <int, i, rwo>
33
Phase 3: Annotation
3: BGE 11
1: MOV 0,%o2
2: CMP %o2, %o1
4: NOP
5: SLL %o2,2,%g2
6: LD [%o0+%g2],%g3
7: ADD %o2,1,%o2
8: CMP %o2, %o1
9: BL 5
10: ADD %o3, %g3, %o3
11: RET
12: MOV %o3, %o0
• Find out safety requirements and facts6: a: <int, i, ro>
%o0: <int[n], {a}, rwof> %g2 : <int, i, rwo>
Global Safety Conditions:
6: LD [%o0+%g2],%g3
Facts: %o0 mod 4=0
6: LD [%o0+%g2],%g3
• Find out safety requirements and facts6: a: <int, i, ro>
%o0: <int[n], {a}, rwof> %g2 : <int, i, rwo>
Local Safety Conditions:a: readable, operable, init,...%g3: writable
6: LD [%o0+%g2],%g3
• Find out safety requirements and facts6: a: <int, i, ro>
%o0: <int[n], {a}, rwof> %g2 : <int, i, rwo>
(%o0+%g2) mod 4=0
6: LD [%o0+%g2],%g3
0 <=%g2 < 4n
6: LD [%o0+%g2],%g3
34
Phase 4: Local Verification
3: BGE 11
1: MOV 0,%o2
2: CMP %o2, %o1
4: NOP
5: SLL %o2,2,%g2
6: LD [%o0+%g2],%g3
7: ADD %o2,1,%o2
8: CMP %o2, %o1
9: BL 5
10: ADD %o3, %g3, %o3
11: RET
12: MOV %o3, %o0
6: LD [%o0+%g2],%g3Local Safety Conditions: a: readable, operable, initialized %g3: writable
• Verify Local Safety Requirements6: a: <int, i, ro>
%o0: <int[n], {a}, rwof>
%g2 : <int, i, rwo>
%g3 : < type, [4], w>
35
Phase 5: Global Verification
Synergy of • A symbolic range analysis
• A program-Verification Technique– Induction-Iteration Method to synthesize loop
invariants [Suzuki & Ishihata 1977]
36
Phase 5: Range Analysis
• The ranges of the registers at each program point– register [ax+by+c, a’x’+b’y’+c’]– x, y, x’ and y’ are either array base or length– e.g., index: [0, length-1], – e.g., pointer: [base, base+length-1]
• Worklist-based algorithm• Immediate widening for quick convergence• Strategy for sharpening analysis
– Pick the right spots in a loop to perform widening– Consider correlation among register values
39
i=0;
i=i+1
i< n
[0,0] U [1,1]=[0,1]
[1,1][1,2]=[1,]
[1,1]...a[i]...
[0,0]
Phase 5: Range Analysis : Example
40
i=0;
i=i+1
i< n
[0,0] U [1,1]=[0,1]
[1,1][1,2]=[1,]
[1,n-1]
...a[i]...
[0,0]
Phase 5: Range Analysis : Example
41
i=0;
i=i+1
i< n
[0,0] [1,n-1]=[0,n-1]
[1, ][1,n]=[1,]
[1,n-1]
...a[i]...
[0,0]
Phase 5: Range Analysis : Example
42
Phase 5: Result of Range Analysis
3: BGE 11
1: MOV 0,%o2
2: CMP %o2, %o1
4: NOP
5: SLL %o2,2,%g2
6: LD [%o0+%g2],%g3
7: ADD %o2,1,%o2
8: CMP %o2, %o1
9: BL 5
10: ADD %o3, %g3, %o3
11: RET
12: MOV %o3, %o0
%g2: [0, 4n-4]
43
Program Verification
Q
R=wlp(S, Q)
R is the weakest condition that if S terminates then Q is true
S
{y+z>0}
Verification Condition (VC) Generation
x=y+z
{x>0}
z=1
{y+1>0}
44
Program Verification
Q
S
Loop invariant:(i) Inv is true on entry(ii) Inv wlp(S, Inv)(iii) Inv wlp(S, Q)
Inv
45
Induction-Iteration Method
Q
W(0) = wlp(S, Q)
W(i+1) = wlp(S, W(i))
W(0) W(1), ..., W(j) W(j+1)
S
46
Phase 5: Result of Induction Iteration
3: BGE 11
1: MOV 0,%o2
2: CMP %o2, %o1
4: NOP
5: SLL %o2,2,%g2
6: LD [%o0+%g2],%g3
7: ADD %o2,1,%o2
8: CMP %o2, %o1
9: BL 5
10: ADD %o3, %g3, %o3
11: RET
12: MOV %o3, %o06: %g2 < 4n
5: %o2 < n
5: SLL %o2,2,%g2%o2<n %o1<=n 5: SLL %o2,2,%g2
47
Enhancements to Induction-Iteration Method
• Handle load/stores• Handle multiple loops• Handle procedure calls• Introduce several strategies to speedup the
induction-iteration method
48
Outline
• Introduction• Safety Properties and Policies• Safety-Checking Analysis• Experimental Evaluation• Conclusions and Future Research
49
Experimental Evaluation• Test cases
Array sum, start/stop timer, b-tree, kernel paging policy, hash, bubble sort, heap sort, stack-smashing, MD5, jPVM, /dev/kerninst (symbol, loggedWrites)
• Summary of Results– Found safety violations in kernel policy, stack-smashing, /dev/kerninst– Verified all conditions, except for some calls in MD5, jPVM (precision
lost due to inability to detect that a loop ‘kills’ all elements of an array)
– Checking times vary from 0.1 to 30 seconds
50
Characteristics of Test CasesS
um
Pagi
ngPo
licy
Sta
rt T
imer
Has
h
Bub
ble
Sor
t
Sto
p T
imer
Btr
ee
Btr
ee2
Hea
p S
ort
2
Hea
p S
ort
Sta
ck-s
mas
hing
jPV
M
/dev
/ker
nins
t/s
ymbol
/dev
/ker
nins
t/l
ogge
dW
rite
s
Md5
I nstructions 13 20 22 25 25 36 41 51 71 95 309 315 339 358 883
Branches 2 5 1 4 5 3 11 11 9 16 89 16 45 36 11
Loops(I nner)
1 2(1) 0 1 2(1) 0 2(1) 2(1) 4(2) 4(2) 7(1) 3 6(4) 65(2)
ProcedureCalls(Trusted)
0 0 1(1) 1 0 2(2) 0 4 (4) 3 0 240(40)
36(25)
48(12)
6
GlobalSafetyConditions(BoundsChecks)
4(2)
9 1315(2)
16(8)
1735(14)
39(14)
56(26)
84(42)
100(74)
99(18)
116(42)
192(40)
121(30)
SourceLanguage
C C C C C C C C C C CC inC++style
C++ C++ C
51
Timing (Seconds)S
um
Pagi
ngPo
licy
Sta
rt T
imer
Has
h
Bub
ble
Sor
t
Sto
p T
imer
Btr
ee
Btr
ee2
Hea
p S
ort
2
Hea
p S
ort
Sta
ck-s
mas
hing
jPV
M
/dev
/ker
nins
t/s
ymbol
/dev
/ker
nins
t/l
ogge
dW
rite
s
Md5
TypestatePropagation
0.02 0.05 0.02 0.04 0.04 0.03 0.09 0.11 0.17 0.15 0.69 3.05 4.88 15.4 5.92
Annotation 0.003 0.005 0.005 0.006 0.005 0.007 0.008 0.01 0.015 0.015 0.03 0.069 0.068 0.26 0.082
RangeAnalysis
0.01 0 0 0.01 0.03 0 0.03 0.04 0.08 0.12 0.54 0.24 0.68 0.95 1.24
I nduction-I teration
0.08 0.18 0.13 0.40 0.18 0.14 0.40 0.035 1.15 2.46 12.74 1.55 8.60 12.33 3..41
TOTAL 0.1 0.23 0.16 0.46 0.26 0.18 0.53 0.51 1.42 2.75 14.0 4.91 14.2 28.94 10.65
53
Some Benefits of Typestate System and Range Analysis
• Summarizing function calls allow the analysis to use summaries of several host and library functions
• Symbolic range analysis allow the system to identify boundaries of array in structure
• Symbolic range analysis speeds up global verification up to 53% (with a median of 29%)
54
Speedup due to Range Analysis
0
0.2
0.4
0.6
0.8
1
1.2 Range Analysis(normalized)
I duction-teration(normalized)
55
Outline
• Introduction• Safety-Properties and Policies• Safety-Checking Analysis• Experimental Evaluation• Conclusions and Future Research
56
Conclusion
• A technique that works on ordinary machine code, and mechanically synthesizes (and verifies) a safety proof– Requiring only the initial inputs to the untrusted code
be annotated
• Extensible: – host-specified access policy– naturally extends to the checking of security
properties
• Experience promising
57
Limitations• Can only ensure safety properties that can be
expressed using typestates + linear constraints– e.g., cannot handle nonlinear array subscripts
• Induction iteration method is incomplete– e.g., generalization capability is limited
• Limitations in handling of arrays– Lost precision
• Inherited limitations of static techniques– Must reject code that cannot be checked statically– Otherwise, there is the recovery problem
58
Directions for Future Research
• Improving the Precision of the Analyses– Developing better algorithms– Employing both static and run-time checks
• Improving the Scalability of the Analysis– Employing modular checking– Employing analyses that are unsound– Producing proof-carrying code
59
Directions for Future Research
• Extending the Techniques beyond Safety Checking– Checking security properties (information flow)
• The access control policies we enforce is discretionary, once the data in the host is read, there is no control as to what the untrusted code can do with the data
– Allows reverse engineering• Use by a run-time optimizer, a performance tool, etc.• We can forsake the most expensive part of our analysis
60
Contributions
1 Opens up the possibility of certifying code produced by off-the-shelf compilers
2 The technique is extensible3 Extended the notion of typestate in several ways4 Handle inheritance polymorphism implemented via
physical subtyping (a new method for coping with subtyping in the presence of mutable pointers)
5 A mechanism for summarizing the effect of function calls
6 A technique to infer information about the sizes and types of stack-allocated arrays
7 A symbolic range analysis for array bounds checks8 A prototype implementation and experimental study