©2000-2003 Zhichen Xu UC Davis 5/2003 Safety Checking of Machine Code Zhichen Xu [email protected] Internet Systems and Storage Laboratory HP Labs, Palo

©2000-2003 Zhichen Xu UC Davis 5/2003

Safety Checking of Machine Code

Zhichen [email protected]

Internet Systems and Storage LaboratoryHP Labs, Palo Alto

(This is the Dissertation Work done at UW-MadisonThesis Advisors: Bart Miller and Tom Reps)

2

Motivation• Dynamic extensibility

– Operating systems: custom policies, general functionality, performance • Extensible OS: exokernel, VINO, SPIN, synthetix... • Commodity OS: SLIC, kerninst, ...

– Databases: type-based extensions • Illustra, informix, paradise, ...

– Web browsers: plug-ins– Performance tools: measurement code

• Kerninst, paradyn, ...

– Other • Dyninst, ...

3

Motivation

• Component-based software (Java, COM)– Software components from different vendors are

combined to construct a complete application– Code from several sources with no mutual trust

Safety is crucial in these domains!

4

Big Picture and Concepts

• Is it safe for untrusted foreign code to be loaded into a trusted host system?

Host

UntrustedCode

Data

Code

Code producer

Code consumer

5

Safety Properties We Enforce

• Default collection of safety conditions– No type violations– No out-of-bounds array accesses, – No misaligned loads/stores, – No uses of uninitialized variables, – No invalid pointer dereferences, – No unsafe interaction with the host

• Precise and flexible host access policy – Customizable

Type safety

6

Outline

• Introduction – Motivation– Related work– High-level characteristics

• Safety-properties and policies• Safety-checking analysis• Experimental evaluation• Conclusions and future research

7

Related Work

• Dynamic Techniques:– Hardware enforced

address spaces, SFI, interpretation, etc.

• Hybrid Techniques– Safe languages: Java, ML,

Modula 3, etc.

Runtime cost

Potential recovery problem

• Static Techniques– Proof-Carrying Code

[Necula, Lee]

– Certifying Compiler, Typed-Assembly Language[Necula, Lee]

[Morrisett, Walker, Crary, Glew, ...][Colby, Lee, Necula, Blau]

8

Related Work vs Our Approach

CertifyingCompiler

Safe C, Java, ML C, C++

Pointer Arithmetic,...

Machine Code

Proof

ProofChecker

Yes/No

Ordinary Binary

Annotated initial inputs

Safety Checker

Yes/No

Off-the-Shelfcc, gcc, as

C, C++, Assembly, ...

Safety Policy,Initial inputs

<<

9

Related Work vs Our Approach

CertifyingCompiler

Safe C, Java, ML C, C++

Pointer Arithmetic,...

Machine Code

Proof

ProofChecker

Yes/No Yes/No

C, C++, Assembly, ...

Ordinary Binary

Proofgenerator

Off-the-Shelfcc, gcc, as

Safety Policy,Initial inputs

OrdinaryBinary

Proof

ProofChecker

10

High-Level Characteristics

• Perform safety checking on ordinary binary, mechanically synthesize (and verify) a safety proof

• Extend the host at a very fine-grained level (allow the untrusted code to manipulate the internal data structures of the host directly)

• Enforce host-specified access policy + type safety

11

Outline

• Introduction• Safety Properties and Policies• Safety-Checking Analysis• Experimental Evaluation• Conclusion and Future Research

12

Safety Properties

• Default collection of safety conditions– No type violations– No out-of-bounds array accesses, – No misaligned loads/stores, – No uses of uninitialized variables, – No invalid pointer dereferences, – No unsafe interaction with the host

(Fine grained memory protection and data abstraction)

• Precise and flexible host access policy – Customizable

13

• Classify locations into regionsAs big as the entire address space, as small as a variable

• [Region : Category : Access]– Category: Types, fields– Access:

readable (r), writable (w), followable (f), executable (e), operable (o)

(e.g., to “copy”, to “examine”)

Host-Specified Access Policy

Locations

Values

14

Protections Provided by Access Policy

Initial inputs to the untrusted code

callTree List

Methods

Host

call

call

rf

rf

rw

rw

rw

rw

rw

call

call

call

x

15

• Kernel page-replacement extension– Pick a cold page from global LRU list.typedef struct _page_list {

int page;...struct _page_list * next;

} page_list;

[Host : page_list.page : ro]

[Host : page_list.next, page_list ptr : rfo]

Principle of “Least Privilege”

// read access

// follow access

16

Outline

• Introduction• Safety Properties and Policies• Safety-Checking Analysis

– An abstract storage model– Safety checking technique– An example

• Experimental Evaluation • Conclusion and Future Research

17

Abstract Storage Model

• Abstract Store : Abstract location Typestate– Abstract locations:

• summarize one or more memory locations • readable, writable, name, size, align, ...

– Typestate : properties of the values• <type, state, access>• Forms a lattice

• Linear Constraints– Linear equalities, linear inequalities– Array bounds checks, alignment checks, etc.e.g., ptr != 0; 0 <=index < Length; addr mod 4=0;

18

Example and Notations

• Abstract Location

• Abstract location typestate– n.page:<page_list.page, i, ro> – n.next: <page_list.next, {n}, rfo>– ptr:<page_list ptr, {n}, rfo>

“i” for initialized, “u” for uninitialized, {n} for reference to n

ptrn1 n2 n3 n4

nptr

19

Safety Checker

Inputs to Our Safety Checker

Ordinary binary

C, C++, Java

cc...

Yes No (why)

Untrusted Trusted

•Access policy: [H: ....page: ro]

init init

•Typestate spec

•Invocation spec (bindings)

%o0

20

Host-Typestate Specification

• Data aspect: – Types and states of host data before the invocation

of untrusted code

• Control aspect – Safety pre- and post- conditions

• Precondition describes the obligations that the actual parameters must meet

• Postcondition provides a guarantee on the resulting state

21

Summarizing Function Calls

• Placeholders: size, access permission, and typestate provide the safety requirements for actual parameters

int gettimeofday (struct timeval* tp);Safety Precondition

%o0: <struct timeval ptr, {null, t}, fo>t: <struct timeval, [u, u], w>

Safety Postconditiont: <struct timeval, [<int, i, o>, <int, i, o>], o>%o0: <int, i, o>other registers:

22

Checking Function Calls

• A binding process:– Check whether the actual abstract locations can

meet the obligations specified by placeholders

• An update process:– Updates the typestates of all actual locations that

are represented by the placeholders according to the safety postcondition

23

Safety Checking: Phases

2: TypestatePropagation

3: Annotation

4: Local Verification

5: Global Verification

1: Preparation

24



3: Annotation



1: Preparation

What does each instruction do?

Is each instruction safe?

25



3: Annotation



1: Preparation

1: Preparation

Untrusted Code

Access policyHost typestate

Invocation

1.Initial Annotation:

AbsLoc->typestate, Linear Constraints

26



3: Annotation



1: Preparation

2: Typestate Propagation

Untrusted Code

1.Initial

Annotation

2.Approximation of

memory contents ateach program point

27



3: Annotation



1: Preparation

3. Annotation

Untrusted Code

2.Approximation

of memory

3. Local Safety Conditions

3. Global Safety

Conditions

3. Facts

28



3: Annotation



1: Preparation

4. LocalVerification

2.Approximation

of memory

3. Local Safety Conditions

True/False

29



3: Annotation



1: Preparation

5b. Induction Iteration

Untrusted Code

3. Global Safety

Conditions

True/False

3, 5a Facts5a. Range Analysis

30

– Host Typestate: a_p: <int[n], {a}, _>, {n1} a: <int, i, _>

A Bounds Checking Example

3: BGE 11

1: MOV 0,%o2

2: CMP %o2, %o1

4: NOP

5: SLL %o2,2,%g2

6: LD [%o0+%g2],%g3

7: ADD %o2,1,%o2

8: CMP %o2, %o1

9: BL 5

10: ADD %o3, %g3, %o3

11: RET

12: MOV %o3, %o0

a

a_p

– Access Policy[a_p : int[n] : rfo] [a: int : ro]

%o0 %o1 =n– Invocation

%o0 <-- a_p; %o1 <--- n

31

– Host Typestate: a_p: <int[n], {a}, _>, {n1}a: <int, i, _>

Phase 1: Preparation

3: BGE 11

1: MOV 0,%o2

2: CMP %o2, %o1

4: NOP

5: SLL %o2,2,%g2

6: LD [%o0+%g2],%g3

7: ADD %o2,1,%o2

8: CMP %o2, %o1

9: BL 5

10: ADD %o3, %g3, %o3

11: RET

12: MOV %o3, %o0

a

a_p

– Initial Annotationa: <int, i, or>, %o0: <int [n], {a}, rwfo>,

%o1:<int, i, rwo>

{%o1=n n1}

– Access Policy[a_p : int[n] : rfo] [a: int : ro]

%o0 %o1 =n– Invocation

%o0 <-- a_p; %o1 <--- n

32

Phase 2: Typestate Propagation

3: BGE 11

1: MOV 0,%o2

2: CMP %o2, %o1

4: NOP

5: SLL %o2,2,%g2

6: LD [%o0+%g2],%g3

7: ADD %o2,1,%o2

8: CMP %o2, %o1

9: BL 5

10: ADD %o3, %g3, %o3

11: RET

12: MOV %o3, %o0

Finds out typestate of each absLoc at each program pointa: <int, i, ro>, %o0: <int [n], {a}, rwfo>, %o1:<int, i, rw> {%o1

= n n 1}%o0 %o2 %g2

<int[n], {a}, > <type, [4], > < type, [4], ><int[n], {a}, > <int, i, > < type, [4], >

1: MOV 0,%o2

<int[n], {a}, > <int, i, > < type, [4], >

1: MOV 0,%o2

2: CMP %o2, %o1

<int[n], {a}, > <int, i, > < type, [4], >

2: CMP %o2, %o1

3: BGE 11

6: a : <int, i, ro>%o0: <int[n], {a},

rwo> %g2 : <int, i, rwo>

<int[n], {a}, > <int, i, > < type, [4], >

3: BGE 11

4: NOP

<int[n], {a}, > <int, i, > < int, i, rwo >

4: NOP

5: SLL %o2,2,%g2

6: LD [%o0+%g2],%g3

6: a : <int, i, ro>%o0: <int[n], {a},

rwo> %g2 : <int, i, rwo>

33

Phase 3: Annotation

3: BGE 11

1: MOV 0,%o2

2: CMP %o2, %o1

4: NOP

5: SLL %o2,2,%g2

6: LD [%o0+%g2],%g3

7: ADD %o2,1,%o2

8: CMP %o2, %o1

9: BL 5

10: ADD %o3, %g3, %o3

11: RET

12: MOV %o3, %o0

• Find out safety requirements and facts6: a: <int, i, ro>

%o0: <int[n], {a}, rwof> %g2 : <int, i, rwo>

Global Safety Conditions:

6: LD [%o0+%g2],%g3

Facts: %o0 mod 4=0

6: LD [%o0+%g2],%g3



Local Safety Conditions:a: readable, operable, init,...%g3: writable

6: LD [%o0+%g2],%g3



(%o0+%g2) mod 4=0

6: LD [%o0+%g2],%g3

0 <=%g2 < 4n

6: LD [%o0+%g2],%g3

34

Phase 4: Local Verification

3: BGE 11

1: MOV 0,%o2

2: CMP %o2, %o1

4: NOP

5: SLL %o2,2,%g2

6: LD [%o0+%g2],%g3

7: ADD %o2,1,%o2

8: CMP %o2, %o1

9: BL 5

10: ADD %o3, %g3, %o3

11: RET

12: MOV %o3, %o0

6: LD [%o0+%g2],%g3Local Safety Conditions: a: readable, operable, initialized %g3: writable

• Verify Local Safety Requirements6: a: <int, i, ro>

%o0: <int[n], {a}, rwof>

%g2 : <int, i, rwo>

%g3 : < type, [4], w>

35

Phase 5: Global Verification

Synergy of • A symbolic range analysis

• A program-Verification Technique– Induction-Iteration Method to synthesize loop

invariants [Suzuki & Ishihata 1977]

36

Phase 5: Range Analysis

• The ranges of the registers at each program point– register [ax+by+c, a’x’+b’y’+c’]– x, y, x’ and y’ are either array base or length– e.g., index: [0, length-1], – e.g., pointer: [base, base+length-1]

• Worklist-based algorithm• Immediate widening for quick convergence• Strategy for sharpening analysis

– Pick the right spots in a loop to perform widening– Consider correlation among register values

37

i=0;

i=i+1

i< n

...a[i]...

n1

Phase 5: Range Analysis : Example

38


i=0;

i=i+1

i< n

[0,0]

[1,1]

[1,1]...a[i]...

[0,0]

39

i=0;

i=i+1

i< n

[0,0] U [1,1]=[0,1]

[1,1][1,2]=[1,]

[1,1]...a[i]...

[0,0]


40

i=0;

i=i+1

i< n

[0,0] U [1,1]=[0,1]

[1,1][1,2]=[1,]

[1,n-1]

...a[i]...

[0,0]


41

i=0;

i=i+1

i< n

[0,0] [1,n-1]=[0,n-1]

[1, ][1,n]=[1,]

[1,n-1]

...a[i]...

[0,0]


42

Phase 5: Result of Range Analysis

3: BGE 11

1: MOV 0,%o2

2: CMP %o2, %o1

4: NOP

5: SLL %o2,2,%g2

6: LD [%o0+%g2],%g3

7: ADD %o2,1,%o2

8: CMP %o2, %o1

9: BL 5

10: ADD %o3, %g3, %o3

11: RET

12: MOV %o3, %o0

%g2: [0, 4n-4]

43

Program Verification

Q

R=wlp(S, Q)

R is the weakest condition that if S terminates then Q is true

S

{y+z>0}

Verification Condition (VC) Generation

x=y+z

{x>0}

z=1

{y+1>0}

44

Program Verification

Q

S

Loop invariant:(i) Inv is true on entry(ii) Inv wlp(S, Inv)(iii) Inv wlp(S, Q)

Inv

45

Induction-Iteration Method

Q

W(0) = wlp(S, Q)

W(i+1) = wlp(S, W(i))

W(0) W(1), ..., W(j) W(j+1)

S

46

Phase 5: Result of Induction Iteration

3: BGE 11

1: MOV 0,%o2

2: CMP %o2, %o1

4: NOP

5: SLL %o2,2,%g2

6: LD [%o0+%g2],%g3

7: ADD %o2,1,%o2

8: CMP %o2, %o1

9: BL 5

10: ADD %o3, %g3, %o3

11: RET

12: MOV %o3, %o06: %g2 < 4n

5: %o2 < n

5: SLL %o2,2,%g2%o2<n %o1<=n 5: SLL %o2,2,%g2

47

Enhancements to Induction-Iteration Method

• Handle load/stores• Handle multiple loops• Handle procedure calls• Introduce several strategies to speedup the

induction-iteration method

48

Outline

• Introduction• Safety Properties and Policies• Safety-Checking Analysis• Experimental Evaluation• Conclusions and Future Research

49

Experimental Evaluation• Test cases

Array sum, start/stop timer, b-tree, kernel paging policy, hash, bubble sort, heap sort, stack-smashing, MD5, jPVM, /dev/kerninst (symbol, loggedWrites)

• Summary of Results– Found safety violations in kernel policy, stack-smashing, /dev/kerninst– Verified all conditions, except for some calls in MD5, jPVM (precision

lost due to inability to detect that a loop ‘kills’ all elements of an array)

– Checking times vary from 0.1 to 30 seconds

50

Characteristics of Test CasesS

um

Pagi

ngPo

licy

Sta

rt T

imer

Has

h

Bub

ble

Sor

t

Sto

p T

imer

Btr

ee

Btr

ee2

Hea

p S

ort

2

Hea

p S

ort

Sta

ck-s

mas

hing

jPV

M

/dev

/ker

nins

t/s

ymbol

/dev

/ker

nins

t/l

ogge

dW

rite

s

Md5

I nstructions 13 20 22 25 25 36 41 51 71 95 309 315 339 358 883

Branches 2 5 1 4 5 3 11 11 9 16 89 16 45 36 11

Loops(I nner)

1 2(1) 0 1 2(1) 0 2(1) 2(1) 4(2) 4(2) 7(1) 3 6(4) 65(2)

ProcedureCalls(Trusted)

0 0 1(1) 1 0 2(2) 0 4 (4) 3 0 240(40)

36(25)

48(12)

6

GlobalSafetyConditions(BoundsChecks)

4(2)

9 1315(2)

16(8)

1735(14)

39(14)

56(26)

84(42)

100(74)

99(18)

116(42)

192(40)

121(30)

SourceLanguage

C C C C C C C C C C CC inC++style

C++ C++ C

51

Timing (Seconds)S

um

Pagi

ngPo

licy

Sta

rt T

imer

Has

h

Bub

ble

Sor

t

Sto

p T

imer

Btr

ee

Btr

ee2

Hea

p S

ort

2

Hea

p S

ort

Sta

ck-s

mas

hing

jPV

M

/dev

/ker

nins

t/s

ymbol

/dev

/ker

nins

t/l

ogge

dW

rite

s

Md5

TypestatePropagation

0.02 0.05 0.02 0.04 0.04 0.03 0.09 0.11 0.17 0.15 0.69 3.05 4.88 15.4 5.92

Annotation 0.003 0.005 0.005 0.006 0.005 0.007 0.008 0.01 0.015 0.015 0.03 0.069 0.068 0.26 0.082

RangeAnalysis

0.01 0 0 0.01 0.03 0 0.03 0.04 0.08 0.12 0.54 0.24 0.68 0.95 1.24

I nduction-I teration

0.08 0.18 0.13 0.40 0.18 0.14 0.40 0.035 1.15 2.46 12.74 1.55 8.60 12.33 3..41

TOTAL 0.1 0.23 0.16 0.46 0.26 0.18 0.53 0.51 1.42 2.75 14.0 4.91 14.2 28.94 10.65

52

Timing

0

10

20

TypestatePropagation

Global Verifi cation

53

Some Benefits of Typestate System and Range Analysis

• Summarizing function calls allow the analysis to use summaries of several host and library functions

• Symbolic range analysis allow the system to identify boundaries of array in structure

• Symbolic range analysis speeds up global verification up to 53% (with a median of 29%)

54

Speedup due to Range Analysis

0

0.2

0.4

0.6

0.8

1

1.2 Range Analysis(normalized)

I duction-teration(normalized)

55

Outline

• Introduction• Safety-Properties and Policies• Safety-Checking Analysis• Experimental Evaluation• Conclusions and Future Research

56

Conclusion

• A technique that works on ordinary machine code, and mechanically synthesizes (and verifies) a safety proof– Requiring only the initial inputs to the untrusted code

be annotated

• Extensible: – host-specified access policy– naturally extends to the checking of security

properties

• Experience promising

57

Limitations• Can only ensure safety properties that can be

expressed using typestates + linear constraints– e.g., cannot handle nonlinear array subscripts

• Induction iteration method is incomplete– e.g., generalization capability is limited

• Limitations in handling of arrays– Lost precision

• Inherited limitations of static techniques– Must reject code that cannot be checked statically– Otherwise, there is the recovery problem

58

Directions for Future Research

• Improving the Precision of the Analyses– Developing better algorithms– Employing both static and run-time checks

• Improving the Scalability of the Analysis– Employing modular checking– Employing analyses that are unsound– Producing proof-carrying code

59

Directions for Future Research

• Extending the Techniques beyond Safety Checking– Checking security properties (information flow)

• The access control policies we enforce is discretionary, once the data in the host is read, there is no control as to what the untrusted code can do with the data

– Allows reverse engineering• Use by a run-time optimizer, a performance tool, etc.• We can forsake the most expensive part of our analysis

60

Contributions

1 Opens up the possibility of certifying code produced by off-the-shelf compilers

2 The technique is extensible3 Extended the notion of typestate in several ways4 Handle inheritance polymorphism implemented via

physical subtyping (a new method for coping with subtyping in the presence of mutable pointers)

5 A mechanism for summarizing the effect of function calls

6 A technique to infer information about the sizes and types of stack-allocated arrays

7 A symbolic range analysis for array bounds checks8 A prototype implementation and experimental study

Documents

©2000-2003 Zhichen Xu UC Davis 5/2003 Safety Checking of Machine Code Zhichen Xu [email protected] Internet Systems and Storage Laboratory HP Labs, Palo