90
5 3 4 5 3 4 5 3 4 5 3 4 5 3 4 5 3 4 5 3 4 Certifying Compilation for Standard ML in a Type Analysis Framework Leaf Petersen Carnegie Mellon University 5 3 4

Certifying Compilation for Standard ML in a Type Analysis Framework

  • Upload
    mayten

  • View
    18

  • Download
    0

Embed Size (px)

DESCRIPTION

3. 3. 3. 4. 4. 4. 5. 5. 5. 3. 3. 4. 4. 5. 5. 3. 3. 3. 4. 4. 4. 5. 5. 5. Certifying Compilation for Standard ML in a Type Analysis Framework. Leaf Petersen Carnegie Mellon University. Motivation. Types. Types capture facts about programs. - PowerPoint PPT Presentation

Citation preview

Page 1: Certifying Compilation for Standard ML in a Type Analysis Framework

53 453 4

53 4 53 4

53 4

53 4

53 4

Certifying Compilation for Standard ML in a Type Analysis Framework

Leaf Petersen

Carnegie Mellon University

53 4

Page 2: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 2

Motivation

Page 3: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 3

• Types capture facts about programs. – Fact: This procedure expects a 32 bit integer.– Fact: This address points to executable code.– Fact: This data structure was produced here.

• Programmers use types:– To keep their facts straight.

• Capture and preserve invariants.– To check their facts.

• Typechecker verifies truth.– Manage complexity.

Types

Page 4: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 4

Types and Compilers

• Compilers use types.– Predict size of data.– Eliminate unnecessary dynamic checks.

• Most compilers forget types early.

P1:T1 P2 .... Pn P.o

Page 5: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 5

Type Preserving Compilation

• Transform types with program.– Optimize code based on types.– Verify that invariants still hold.– Emit types on object code.

P1:T1 P2:T2 .... Pn:Tn P.o : To

Page 6: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 6

TILT• Type preserving compiler

– Standard ML.– Sparc, Alpha, (now) x86 backends– Perry Cheng, Chris Stone, Leaf Petersen,

Dave Swasey, and others.• Intermediate languages are typed

– Type based optimizations.– Internal correctness checks.

• Generates typed x86 object code (this thesis).

Page 7: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 7

Why TILT?

• Want to compile SML efficiently.– Separate compilation is a must.

• Traditional optimizations.– Loop optimizations, CSE, constant

folding, and many more.• New challenges for optimization.

– Polymorphism, GC, 1st class functions, modules, etc.

Page 8: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 8

Example: Unknown Types.• Module interfaces (and polymorphism)

introduce unknown types:

• Clients compiled against interface– Cannot know what t is (may be

instantiated multiple times)– Cannot predict size of value (if sizes vary).– Cannot predict traceability of value.

Ptr/non-ptr

Page 9: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 9

Old Solutions• C, C++, Java: No unknown types.

– Objects: “partially known” types.• Traditional ML/Lisp compilers:

Uniform data representation.– All values are same size (e.g. 32 bits).– Large values (e.g. 64 bit floats) must

be boxed.– Traceability dealt with via tagging

(e.g. 31 bit ints).

Page 10: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 10

TILT Solution

• Types tell size and traceability of data.

• Unknown types are instantiated with known types at runtime.– Most compilers discard types before

generating code.• TILT: Keep types at runtime and use

them to dynamically determine layout and traceability.

Page 11: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 11

Type analysis• type Optarray[t] = Typecase[t] of Boxed(Float) => Array64[Float] | _ => Array32[t]• Note:

– Optarray[Int] == Array32[Int]– Optarray[a] where a is unknown is dynamic

• Constructor for type Optarray?– optarray[t] : int x t -> Optarray[t]

Page 12: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 12

Type analysis• optarray[t](len : int,init : t) : Optarray[t] =

typecase [t] of Boxed (Float) =>

new_array64[Float](len, unbox(init)) | _ =>

new_array32[t](len,init)• For statically known types, reduces at compile

time– optarray[Int](10,0) = new_array32[Int](10,0)

• For unknown types, reduces at runtime

Page 13: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 13

Type-passing Optimizations• Type analysis:

– Enables global representation optimizations in the presence of unknown types.

• TILT uses types at runtime for:– Better data-layouts.

• Unboxed arrays of 64 bit floats• 32 bit ints• Optimized sum representations

– Flatten aggregate arguments into registers.– Mostly tag-free garbage collection.

Page 14: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 14

There’s more

• Types can help with generating efficient code.

• But not the end of the story....

Page 15: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 15

Mobile Code

• Code has become mobile.– May know very little about producer.

• Examples:– Web applets.– Grid computing.– Binary installations/upgrades.– Application downloads.

• High risk from malicious/wrong code.

Page 16: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 16

The Certification Problem

• Source language safety is checkable.– Typechecker checks the programmers facts.

• Raw object code is not checkable.• Safety relies on trust in:

– Safety of source language.– Correctness/identity of producer/compiler.– Integrity of the object code.

Page 17: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 17

Java Approach

• Java bytecode– High-level language (almost Java)– Can be typechecked

• Interpreted – slow, somewhat complicated

• JIT compiled– somewhat faster, quite complicated

• Large trusted computing base

Page 18: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 18

Certified Code• Typed object code

– Types certify safety• Code consumer

– Does no compiling– Checks that certificate applies (easy)– Small trusted computing base

• Several instances exist:– TAL: Typed Assembly Language– PCC: Proof Carrying Code– Many extensions and variations

Page 19: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 19

Certifying Compilers

• Programs in safe languages – Types provide needed annotations– Compiler can emit code with

certificate of type/memory safety• Certifying compilers exist for:

– Safe subsets of C (TAL & PCC)– Java (PCC)

• Now for Standard ML

Page 20: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 20

Types in Compilation

• Types can be used to generate efficient code.

• Types can be used to generate certified code.

• Want to combine the two paradigms.

Page 21: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 21

My Thesis

Certifying compilation of type analyzing code is feasible for a full modern language such as Standard ML.

Page 22: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 22

Two compilers

• Theoretical compiler– Formal translation– Prove important properties– Guide the implementation

• Real compiler– Follows the structure of the

theoretical compiler– Targets a real certified code system.

Page 23: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 23

Theory

Page 24: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 24

Theoretical compiler

• Three languages: – Singleton free MIL– LIL– Idealized TAL (ITAL)

• Formal translations:– MIL to LIL– Closure conversion of LIL code– LIL to ITAL

Page 25: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 25

Languages• Singleton free MIL

– Lambda calculus– Syntactic restriction to named form– Type analysis through primitives

• LIL– Much more fine-grained than MIL

• type and type analysis representation• closure representation

• ITAL– Machine language– Idealized TAL– Simplified TAL with LX primitives for type

analysis

Page 26: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 26

Translations

• MIL to LIL– Very different type structure– Moderately different term structure– See my dissertation.

• Closure conversion– Very standard

• LIL to ITAL– Type structure is almost identical– Term structure is very different

• Explicit control flow• Binding replaced with state modification

Page 27: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 27

LIL typing

;; ` e :

• – LIL heap context• – LIL type context• – LIL term context• e – LIL expression (named form)• – LIL type for e

Page 28: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 28

ITAL typing

; ` I ok

• – ITAL heap context• – ITAL type context• – ITAL register file type• I – ITAL instruction sequence

Page 29: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 29

ITAL typing

; ` I ok

• – ITAL heap context• – ITAL type context• – ITAL register file type• I – ITAL instruction sequence

Page 30: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 30

Register files

• A register file type maps registers to ITAL types– e.g. (r) = – Notation: {r:} means with the

type of r set to .• Designated stack pointer register sp

– (sp) = – describes the stack slots

Page 31: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 31

LIL to ITAL Translations

• || - heap context translation• || - type context translation• || - type translation• Exp e maps to instruction seq I• But what is the translation of a

term context?

Page 32: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 32

Register files

• LIL variables occupy ITAL registers (or stack slots)

• Hence, the translation of a LIL context is an ITAL register file.

• Problem: what register file?• Variables are related to registers

via register allocation.

Page 33: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 33

Register allocation

• Previous work builds register allocation into the translation.– Complex and tedious– Unclear how to incorporate real RA (e.g.

Graph coloring)– Consequently, toy register allocators are

used in formal presentations• Better idea: translate with respect to

abstract register allocator.

Page 34: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 34

Allocator

Definition: An allocator A is an object such that:

1. For every variable x:– A(x) = r or A(x) = sp(i)

2. frmsz(A) is a natural number3. For every LIL typing context and

stack type , ||A = for some register file type M

Page 35: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 35

Translation judgment

;;;A, ` e : Ã I• – LIL heap context• – LIL type context• – LIL term context• A – Allocator• – describes stack below frame• I – ITAL instruction sequence• For this talk, I’m ignoring exceptions,

other stuff.

Page 36: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 36

Translation judgment

;;;A[z! r1 , x! r1 , y! r2] , ` z = x+y : int à add r1,r2

Page 37: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 37

Question

;;;A, ` e : Ã I• Why should I be well-typed?• Is the equational theory rich enough?

– Easy to rely on equations that don’t hold• Want to show soundness:

– Each translation maps well-typed terms to well-typed terms.

• Doesn’t hold for all allocators: only the good ones.

Page 38: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 38

Good allocator for

Definition: Let = ||A. We say that A is a good allocator for if:

1. (sp) = f ± such that frmsz(A) = f 2. |²|A is the empty machine state.3. If = 1, x:, 2 then

a) A is a good allocator for 1 and 2

b) If A(x) = r then ||A = |1,2|A{r:||}c) If A(x) = sp(i) then something similar.

Page 39: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 39

Good allocator for e

Definition: An allocator A is a good allocator for an expression e if:

1. For all derivations of ;;` e : , A is a good allocator for .

2. A is a good allocator for all sub-expressions of e.

Page 40: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 40

Soundness

Theorem: If A is a good allocator for e and ;; ` e : and is a well-formed stack type and ;;;A, ` e : Ã I then ||;||; ` I ok where M = ||A

Page 41: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 41

Benefits of this approach

• Theory close to implementation• Register allocation is a parameter

– Separates out the mechanism– Concise specification of interface

between code gen and RA– Translation isn’t bogged down with

algorithmic details of RA

Page 42: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 42

Downside: completeness• Depends on register allocator

– Full completeness doesn’t hold– Possible to show parametric completeness?– Not clear what this means

• Worthwhile tradeoff– Formal presentation very close to

implementation– In practice:

• Soundness is hard (implementation had bugs).• Completeness is just a matter of covering all cases.

• Likely that this can be solved (future work)

Page 43: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 43

Summary (Theory)

• Formal translations:– MIL to LIL– Closure conversion of LIL code– LIL to ITAL

• Proof of soundness for each• New approach to dealing with typed

RA• Provides a guide for......

Page 44: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 44

Practice

Page 45: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 45

Real Compiler• Implemented a certifying back end for TILT.

– Targets TAL for x86.• Type representation and analysis made

explicit– Not gc interface (yet).

• Data layout issues made explicit.– Boxing/unboxing.– Closure representations.– Heap data layout.

Page 46: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 46

Code generationType representationUntyped output!Subsequent compilation is mostly standard.

Shrinking inliningSpeculative inliningCSE/Dead code elimConstant foldingUncurryingMonomorphizationFlatteningEta reductionClosure conversionHoistingOthers

Eliminate modulesSome data rep

RTL (Untyped)

SML Source

HIL (Typed)

MIL (Typed)

MIL (Typed)

Code Gen

Optimize

Phase split

Elaborate

Typecheck

Page 47: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 47

RTL (Untyped)

SML Source

HIL (Typed)

MIL (Typed)

MIL (Typed)

Code Gen

Optimize

Phase split

Elaborate

Page 48: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 48

New TILT IL• LIL: Low-level internal language

– Based on LX (Crary & Weirich)• Data representation explicit• Still lambda calculus-ish

– Call/return (not CPS)• All heap allocation explicit• Type analysis implemented at the term

level– Neat Trick– See the dissertation

Page 49: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 49

MIL (Typed)

Front end

Type repLIL (Typed)

Optimize

Closure ConvLIL (Typed)

Code Gen TAL (Typed)

LIL (Typed)

CSE/Dead code elimConstant foldingEta reductionSwitch reductionOthers

Dynamic type repsData rep structureUnified allocation

Types and termsRecursive codeSome optsDirect to TALx86Reg alloc/cogen Small peephole opts

Singleton elim

Page 50: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 50

Compilation

TILT

fib.sml

fib/obj.tofib/obj.o

TALx86fib/asm.tal

Page 51: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 51

Separate Compilation

fib.sml fib.int

Int.int TextIO.int

:>

Page 52: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 52

TILT Compilation Model

fib.sml fib.int

Int.int TextIO.int

:>fib/asm.tal fib/asm_e.tali

TextIO/asm_e.taliInt/asm_e.tali

:>

Page 53: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 53

Annotation size• Annotation overhead for a file (e.g. fib)

– size(fib/fib.to) + size(fib/fib_e.tali)• Important question: how big?

– Mobile code requires small annotation– Annotation size affects checking time

• Optimizing for size:– Not part of my thesis!!

• Important to measure– Want to understand the baseline– Actually pretty good!

Page 54: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 54

Micro-Benchmarks

Tag Description

takc curried function callstaku uncurried function callsFib fib, fact with default IntFib32 fib, fact with 32 bit intsPI approximation of pi (fp)

Page 55: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 55

Selected Larger Benchmarks

Tag Description LOC

msort Merge sort (lists) 48life Game of life (lists) 205pqueens P queens problem (arrays) 292frank Small theorem prover 473leroy Knuth bendix completion (exceptions) 537simple Spherical fluid dynamics 860tyan Grobner basis calculation 896lexgen Lexical-analyzer generator 1178pia Perspective inversion algorithm 2074

Page 56: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 56

Benchmark sizes (abs)

0.00

100.00

200.00

300.00

400.00

500.00

600.00

RU

N

TIM

EA

ND

RU

N

Takc

Taku P

I

fib

fib32

Isor

t

mso

rt

Qui

ckso

rt2

Qui

ckso

rt

btim

es fft

Tim

eAnd

Run

PQ

ueen

s

Bar

nesH

ut life

frank

lero

y

sim

ple

tyan pia

lexg

en

boye

r

obj.o obj.to asm_e.tali asm_i.tali

Page 57: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 57

Type overhead (relative)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

obj.o obj.to asm_e.tali asm_i.tali

Page 58: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 58

Annotation size

• Average factor of 5 increase• Factor of 2-3 for larger programs• Many opportunities for

improvement• Separate compilation overhead• Additional issues, discussion in my

dissertation

Page 59: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 59

Performance

• Really not part of my thesis!!– Not an optimizing back end

• Nonetheless, important points:– Valuable to measure and understand– Not a toy compiler

Page 60: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 60

Comparing Apples to Fish

• MLTon– Whole program compiler.– Very good code!

• SML-NJ– Incremental compiler– Widely used, supports interactive loop

• TILT– Separate compilation– Conservative (malloc based) GC

• TILT (Whole program)

Page 61: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 61

Normalized runtimes

0

0.5

1

1.5

2

2.5

3

3.5

4

Taku Takc Fib Fib32 PI

TILT TILT (Whole) SML/NJ MLTon

Page 62: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 62

Normalized runtimes

0

2

4

6

8

10

12

14

16

Life Leroy Simple Tyan Msort Pia Lexgen PQueens

TILT TILT (Whole) NJ110.42 MLTON

Page 63: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 63

Certifying TILT

• Type preserving, optimizing, certifying compiler for Standard ML.

• Interesting theoretical and practical challenges.– Data representation.– Type analysis representation.– Engineering challenges– Proof scalability challenges

Page 64: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 64

My Thesis

Certifying compilation of type analyzing code is feasible for a full modern language such as Standard ML.

Page 65: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 65

Related work

• Certifed Code– PCC (Necula & Lee), Foundational PCC

(Appel)– TAL (Morrissett, et al.), Foundational TAL

(Crary)• Typed compilation

– TIL, TILT (CMU) – Popcorn (Cornell)– Special-J (Cedilla systems)– Flint (Yale)

Page 66: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 66

Where to now?• Data representation.

– Integrating GC semantics in IL– Typed models of GC

• TILT and typed compilation.– Incorporate memory allocation work.– Extend space of safety policies.

• Language design.– Combining LLL and HLL paradigms

cleanly.

Page 67: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 67

Implications of goodness

• If (x) = and A(x) = r then ||A(r) = ||

• |,x:|A = ||A{r:||}

• ||A[|c|/] = |[c/]|[|c|A

Page 68: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 72

Effects of singleton elimination

0

0.5

1

1.5

Life

Leroy

Simple

Tyan

Msort Pia

Lexg

enFib

Fib32 PI

PQueens

Takc

Taku

SingElim NoSingElim

Page 69: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 73

Component sizes (rel)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Basis Lib Arg Link Bench

obj.o obj.to asm_e.tali asm_i.tali

Page 70: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 74

Component sizes (abs)

0.00

2000.00

4000.00

6000.00

8000.00

10000.00

12000.00

14000.00

16000.00

18000.00

asm_i.tali 50.17 4.45 0.11 0.00 1.96

asm_e.tali 6917.00 3731.66 91.22 0.00 1718.69

obj.to 8953.31 5658.56 117.81 7264.31 2787.91

obj.o 1183.08 1005.05 11.09 46.05 988.47

Basis Lib Arg Link Bench

Page 71: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 75

Type overhead (relative)

0.00

100.00

200.00

300.00

400.00

500.00

600.00

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134

obj.o obj.to asm_e.tali asm_i.tali

Page 72: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 76

Normalized runtimes

0

2

4

6

8

10

12

14

16

Life Leroy Simple Tyan Msort Pia Lexgen Taku Takc PQueens Fib Fib32 PI

TILT TILT (Whole) NJ110.42 MLTON

Page 73: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 77

Benchmark sizes (abs)

0.00

100.00

200.00

300.00

400.00

500.00

600.00

RU

N

TIM

EA

ND

RU

N

Takc

Taku P

I

fib

fib32

Isor

t

mso

rt

Qui

ckso

rt2

Qui

ckso

rt

btim

es fft

Tim

eAnd

Run

PQ

ueen

s

Bar

nesH

ut life

frank

lero

y

sim

ple

tyan pia

lexg

en

boye

r

obj.o obj.to asm_e.tali asm_i.tali

Page 74: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 78

Assembly vs object sizes (abs)

0.00

2000.00

4000.00

6000.00

8000.00

10000.00

12000.00

14000.00

16000.00

asm.tal 13740.51 12948.89 155.55 188.54 6963.54

obj.o+obj.to 10136.38 6663.61 128.90 7310.36 3776.37

Basis Lib Arg Link Bench

Page 75: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 79

Static reps

Kind of static representations of types:

Static type representations:

Page 76: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 80

Interpretation function

Maps reps to the types being represented

Note:

Page 77: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 81

Static reps

Define type constructor for optimized arrays

Uses case analysis instead of Typecase

Page 78: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 82

Dynamic case

Define term constructor for optimized arrays

Page 79: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 83

Type erasure

• Wait!! Still branching on types!– Wanted to get rid of “Typecase”, just

replaced it with “case” on types!• Clever trick:

– Encode type sums as term sums– Use type refinement to reflect term info

back into type level• Replace “case” on types with “case”

on terms.

Page 80: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 84

MIL Types• Type size controlled using definitions

– Let a be c in E– Singleton kinds: a::S(c)

• Advantages– Optimizer improves sharing.– Sharing is intrinsic in the calculus– Needed for efficient compilation anyway

• Disadvantages– Equality is contextual, not syntactic– Massive duplication of mechanism

Page 81: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 85

Complete Larger benchmarks

• Isort – insertion sort (lists)• Msort – merge sort (lists)• Quicksort – quick sort (arrays)• fft – fast fourier transform• pqueens – p queens problem (array intensive)• barnes hut – n body simulation• life – game of life• frank – small theorem prover• leroy – knuth bendix completion• simple – spherical fluid dynamics• tyan – grobner basis calculation• Pia – perspective inversion algorithm (image processing)• lexgen – lexical-analyzer generator• boyer – theorem proving

Page 82: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 86

Standard ML

• Type/memory safe.• General purpose language.

– Mutable data, standard libraries, GC.• Advanced features.

– Polymorphism (templates/generics).– First class functions (inner classes).– Modules and interfaces.– Type inference.

• Language features of tomorrow’s industrial languages.

Page 83: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 87

LIL

• LIL: Low-level internal language– Based on LX (Crary & Weirich)

• Data representation explicit• Still lambda calculus-ish

– Call/return (not CPS)• All heap allocation explicit

Page 84: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 88

Engineering benefits

• Compiler can largely ignore types– No need to optimize– Equality is syntactic

• Type size controlled by separate mechanisms– Hash-consing– Higher-order abstract syntax– At TAL level, via ad-hoc definitions

Page 85: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 89

New Challenges

• Controlling type size!– Types can be very large as trees.– Must maintain/traverse DAG structure.

• Must optimize types.– Types exist at runtime.– Inlining, CSE, etc must be done on types.

• Compiler must maintain well-typedness.

Page 86: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 90

Libraries and linking

• Basis: Standard ML Basis library– provides basic language and OS functionality

• SML/NJ Lib– provides extended data structures

• Arg– command line processing

• Link– Compiler generated link unit

Page 87: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 91

Compilation (over-simplified)

TILT

fib.sml

fib/obj.tofib/obj.o

TALx86fib/asm.tal

Page 88: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 92

Example: Unknown Types.Implements

Page 89: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 93

Example: Uniform Data Rep.

0.0x

0x =

Page 90: Certifying Compilation for Standard ML in a Type Analysis Framework

Carnegie Mellon University 94

LIL Type Analysis• Neat trick: typecase implemented at

the term level!– All runtime data exists as ordinary terms.– All control flow branches are on terms.– Types can be erased before running.

• Accomplished via type refinement– Kind structure tracks type/rep connection.

• Too involved for this talk– Detailed development in my dissertation