Certifying Compilation for Standard ML in a Type Analysis Framework

53 453 4

53 4 53 4

53 4

53 4

53 4

Certifying Compilation for Standard ML in a Type Analysis Framework

Leaf Petersen

Carnegie Mellon University

53 4

Carnegie Mellon University 2

Motivation


• Types capture facts about programs. – Fact: This procedure expects a 32 bit integer.– Fact: This address points to executable code.– Fact: This data structure was produced here.

• Programmers use types:– To keep their facts straight.

• Capture and preserve invariants.– To check their facts.

• Typechecker verifies truth.– Manage complexity.

Types


Types and Compilers

• Compilers use types.– Predict size of data.– Eliminate unnecessary dynamic checks.

• Most compilers forget types early.

P1:T1 P2 .... Pn P.o


Type Preserving Compilation

• Transform types with program.– Optimize code based on types.– Verify that invariants still hold.– Emit types on object code.

P1:T1 P2:T2 .... Pn:Tn P.o : To


TILT• Type preserving compiler

– Standard ML.– Sparc, Alpha, (now) x86 backends– Perry Cheng, Chris Stone, Leaf Petersen,

Dave Swasey, and others.• Intermediate languages are typed

– Type based optimizations.– Internal correctness checks.

• Generates typed x86 object code (this thesis).


Why TILT?

• Want to compile SML efficiently.– Separate compilation is a must.

• Traditional optimizations.– Loop optimizations, CSE, constant

folding, and many more.• New challenges for optimization.

– Polymorphism, GC, 1st class functions, modules, etc.


Example: Unknown Types.• Module interfaces (and polymorphism)

introduce unknown types:

• Clients compiled against interface– Cannot know what t is (may be

instantiated multiple times)– Cannot predict size of value (if sizes vary).– Cannot predict traceability of value.

Ptr/non-ptr


Old Solutions• C, C++, Java: No unknown types.

– Objects: “partially known” types.• Traditional ML/Lisp compilers:

Uniform data representation.– All values are same size (e.g. 32 bits).– Large values (e.g. 64 bit floats) must

be boxed.– Traceability dealt with via tagging

(e.g. 31 bit ints).


TILT Solution

• Types tell size and traceability of data.

• Unknown types are instantiated with known types at runtime.– Most compilers discard types before

generating code.• TILT: Keep types at runtime and use

them to dynamically determine layout and traceability.


Type analysis• type Optarray[t] = Typecase[t] of Boxed(Float) => Array64[Float] | _ => Array32[t]• Note:

– Optarray[Int] == Array32[Int]– Optarray[a] where a is unknown is dynamic

• Constructor for type Optarray?– optarray[t] : int x t -> Optarray[t]


Type analysis• optarray[t](len : int,init : t) : Optarray[t] =

typecase [t] of Boxed (Float) =>

new_array64[Float](len, unbox(init)) | _ =>

new_array32[t](len,init)• For statically known types, reduces at compile

time– optarray[Int](10,0) = new_array32[Int](10,0)

• For unknown types, reduces at runtime


Type-passing Optimizations• Type analysis:

– Enables global representation optimizations in the presence of unknown types.

• TILT uses types at runtime for:– Better data-layouts.

• Unboxed arrays of 64 bit floats• 32 bit ints• Optimized sum representations

– Flatten aggregate arguments into registers.– Mostly tag-free garbage collection.


There’s more

• Types can help with generating efficient code.

• But not the end of the story....


Mobile Code

• Code has become mobile.– May know very little about producer.

• Examples:– Web applets.– Grid computing.– Binary installations/upgrades.– Application downloads.

• High risk from malicious/wrong code.


The Certification Problem

• Source language safety is checkable.– Typechecker checks the programmers facts.

• Raw object code is not checkable.• Safety relies on trust in:

– Safety of source language.– Correctness/identity of producer/compiler.– Integrity of the object code.


Java Approach

• Java bytecode– High-level language (almost Java)– Can be typechecked

• Interpreted – slow, somewhat complicated

• JIT compiled– somewhat faster, quite complicated

• Large trusted computing base


Certified Code• Typed object code

– Types certify safety• Code consumer

– Does no compiling– Checks that certificate applies (easy)– Small trusted computing base

• Several instances exist:– TAL: Typed Assembly Language– PCC: Proof Carrying Code– Many extensions and variations


Certifying Compilers

• Programs in safe languages – Types provide needed annotations– Compiler can emit code with

certificate of type/memory safety• Certifying compilers exist for:

– Safe subsets of C (TAL & PCC)– Java (PCC)

• Now for Standard ML


Types in Compilation

• Types can be used to generate efficient code.

• Types can be used to generate certified code.

• Want to combine the two paradigms.


My Thesis

Certifying compilation of type analyzing code is feasible for a full modern language such as Standard ML.


Two compilers

• Theoretical compiler– Formal translation– Prove important properties– Guide the implementation

• Real compiler– Follows the structure of the

theoretical compiler– Targets a real certified code system.


Theory


Theoretical compiler

• Three languages: – Singleton free MIL– LIL– Idealized TAL (ITAL)

• Formal translations:– MIL to LIL– Closure conversion of LIL code– LIL to ITAL


Languages• Singleton free MIL

– Lambda calculus– Syntactic restriction to named form– Type analysis through primitives

• LIL– Much more fine-grained than MIL

• type and type analysis representation• closure representation

• ITAL– Machine language– Idealized TAL– Simplified TAL with LX primitives for type

analysis


Translations

• MIL to LIL– Very different type structure– Moderately different term structure– See my dissertation.

• Closure conversion– Very standard

• LIL to ITAL– Type structure is almost identical– Term structure is very different

• Explicit control flow• Binding replaced with state modification


LIL typing

;; ` e :

• – LIL heap context• – LIL type context• – LIL term context• e – LIL expression (named form)• – LIL type for e


ITAL typing

; ` I ok

• – ITAL heap context• – ITAL type context• – ITAL register file type• I – ITAL instruction sequence


ITAL typing

; ` I ok

• – ITAL heap context• – ITAL type context• – ITAL register file type• I – ITAL instruction sequence


Register files

• A register file type maps registers to ITAL types– e.g. (r) = – Notation: {r:} means with the

type of r set to .• Designated stack pointer register sp

– (sp) = – describes the stack slots


LIL to ITAL Translations

• || - heap context translation• || - type context translation• || - type translation• Exp e maps to instruction seq I• But what is the translation of a

term context?


Register files

• LIL variables occupy ITAL registers (or stack slots)

• Hence, the translation of a LIL context is an ITAL register file.

• Problem: what register file?• Variables are related to registers

via register allocation.


Register allocation

• Previous work builds register allocation into the translation.– Complex and tedious– Unclear how to incorporate real RA (e.g.

Graph coloring)– Consequently, toy register allocators are

used in formal presentations• Better idea: translate with respect to

abstract register allocator.


Allocator

Definition: An allocator A is an object such that:

1. For every variable x:– A(x) = r or A(x) = sp(i)

2. frmsz(A) is a natural number3. For every LIL typing context and

stack type , ||A = for some register file type M


Translation judgment

;;;A, ` e : Ã I• – LIL heap context• – LIL type context• – LIL term context• A – Allocator• – describes stack below frame• I – ITAL instruction sequence• For this talk, I’m ignoring exceptions,

other stuff.


Translation judgment

;;;A[z! r1 , x! r1 , y! r2] , ` z = x+y : int Ã add r1,r2


Question

;;;A, ` e : Ã I• Why should I be well-typed?• Is the equational theory rich enough?

– Easy to rely on equations that don’t hold• Want to show soundness:

– Each translation maps well-typed terms to well-typed terms.

• Doesn’t hold for all allocators: only the good ones.


Good allocator for

Definition: Let = ||A. We say that A is a good allocator for if:

1. (sp) = f ± such that frmsz(A) = f 2. |²|A is the empty machine state.3. If = 1, x:, 2 then

a) A is a good allocator for 1 and 2

b) If A(x) = r then ||A = |1,2|A{r:||}c) If A(x) = sp(i) then something similar.


Good allocator for e

Definition: An allocator A is a good allocator for an expression e if:

1. For all derivations of ;;` e : , A is a good allocator for .

2. A is a good allocator for all sub-expressions of e.


Soundness

Theorem: If A is a good allocator for e and ;; ` e : and is a well-formed stack type and ;;;A, ` e : Ã I then ||;||; ` I ok where M = ||A


Benefits of this approach

• Theory close to implementation• Register allocation is a parameter

– Separates out the mechanism– Concise specification of interface

between code gen and RA– Translation isn’t bogged down with

algorithmic details of RA


Downside: completeness• Depends on register allocator

– Full completeness doesn’t hold– Possible to show parametric completeness?– Not clear what this means

• Worthwhile tradeoff– Formal presentation very close to

implementation– In practice:

• Soundness is hard (implementation had bugs).• Completeness is just a matter of covering all cases.

• Likely that this can be solved (future work)


Summary (Theory)

• Formal translations:– MIL to LIL– Closure conversion of LIL code– LIL to ITAL

• Proof of soundness for each• New approach to dealing with typed

RA• Provides a guide for......


Practice


Real Compiler• Implemented a certifying back end for TILT.

– Targets TAL for x86.• Type representation and analysis made

explicit– Not gc interface (yet).

• Data layout issues made explicit.– Boxing/unboxing.– Closure representations.– Heap data layout.


Code generationType representationUntyped output!Subsequent compilation is mostly standard.

Shrinking inliningSpeculative inliningCSE/Dead code elimConstant foldingUncurryingMonomorphizationFlatteningEta reductionClosure conversionHoistingOthers

Eliminate modulesSome data rep

RTL (Untyped)

SML Source

HIL (Typed)

MIL (Typed)

MIL (Typed)

Code Gen

Optimize

Phase split

Elaborate

Typecheck


RTL (Untyped)

SML Source

HIL (Typed)

MIL (Typed)

MIL (Typed)

Code Gen

Optimize

Phase split

Elaborate


New TILT IL• LIL: Low-level internal language

– Based on LX (Crary & Weirich)• Data representation explicit• Still lambda calculus-ish

– Call/return (not CPS)• All heap allocation explicit• Type analysis implemented at the term

level– Neat Trick– See the dissertation


MIL (Typed)

Front end

Type repLIL (Typed)

Optimize

Closure ConvLIL (Typed)

Code Gen TAL (Typed)

LIL (Typed)

CSE/Dead code elimConstant foldingEta reductionSwitch reductionOthers

Dynamic type repsData rep structureUnified allocation

Types and termsRecursive codeSome optsDirect to TALx86Reg alloc/cogen Small peephole opts

Singleton elim


Compilation

TILT

fib.sml

fib/obj.tofib/obj.o

TALx86fib/asm.tal


Separate Compilation

fib.sml fib.int

Int.int TextIO.int

:>


TILT Compilation Model

fib.sml fib.int

Int.int TextIO.int

:>fib/asm.tal fib/asm_e.tali

TextIO/asm_e.taliInt/asm_e.tali

:>


Annotation size• Annotation overhead for a file (e.g. fib)

– size(fib/fib.to) + size(fib/fib_e.tali)• Important question: how big?

– Mobile code requires small annotation– Annotation size affects checking time

• Optimizing for size:– Not part of my thesis!!

• Important to measure– Want to understand the baseline– Actually pretty good!


Micro-Benchmarks

Tag Description

takc curried function callstaku uncurried function callsFib fib, fact with default IntFib32 fib, fact with 32 bit intsPI approximation of pi (fp)


Selected Larger Benchmarks

Tag Description LOC

msort Merge sort (lists) 48life Game of life (lists) 205pqueens P queens problem (arrays) 292frank Small theorem prover 473leroy Knuth bendix completion (exceptions) 537simple Spherical fluid dynamics 860tyan Grobner basis calculation 896lexgen Lexical-analyzer generator 1178pia Perspective inversion algorithm 2074


Benchmark sizes (abs)

0.00

100.00

200.00

300.00

400.00

500.00

600.00

RU

N

TIM

EA

ND

RU

N

Takc

Taku P

I

fib

fib32

Isor

t

mso

rt

Qui

ckso

rt2

Qui

ckso

rt

btim

es fft

Tim

eAnd

Run

PQ

ueen

s

Bar

nesH

ut life

frank

lero

y

sim

ple

tyan pia

lexg

en

boye

r

obj.o obj.to asm_e.tali asm_i.tali


Type overhead (relative)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%



Annotation size

• Average factor of 5 increase• Factor of 2-3 for larger programs• Many opportunities for

improvement• Separate compilation overhead• Additional issues, discussion in my

dissertation


Performance

• Really not part of my thesis!!– Not an optimizing back end

• Nonetheless, important points:– Valuable to measure and understand– Not a toy compiler


Comparing Apples to Fish

• MLTon– Whole program compiler.– Very good code!

• SML-NJ– Incremental compiler– Widely used, supports interactive loop

• TILT– Separate compilation– Conservative (malloc based) GC

• TILT (Whole program)


Normalized runtimes

0

0.5

1

1.5

2

2.5

3

3.5

4

Taku Takc Fib Fib32 PI

TILT TILT (Whole) SML/NJ MLTon


Normalized runtimes

0

2

4

6

8

10

12

14

16

Life Leroy Simple Tyan Msort Pia Lexgen PQueens

TILT TILT (Whole) NJ110.42 MLTON


Certifying TILT

• Type preserving, optimizing, certifying compiler for Standard ML.

• Interesting theoretical and practical challenges.– Data representation.– Type analysis representation.– Engineering challenges– Proof scalability challenges


My Thesis

Certifying compilation of type analyzing code is feasible for a full modern language such as Standard ML.


Related work

• Certifed Code– PCC (Necula & Lee), Foundational PCC

(Appel)– TAL (Morrissett, et al.), Foundational TAL

(Crary)• Typed compilation

– TIL, TILT (CMU) – Popcorn (Cornell)– Special-J (Cedilla systems)– Flint (Yale)


Where to now?• Data representation.

– Integrating GC semantics in IL– Typed models of GC

• TILT and typed compilation.– Incorporate memory allocation work.– Extend space of safety policies.

• Language design.– Combining LLL and HLL paradigms

cleanly.


Implications of goodness

• If (x) = and A(x) = r then ||A(r) = ||

• |,x:|A = ||A{r:||}

• ||A[|c|/] = |[c/]|[|c|A


Effects of singleton elimination

0

0.5

1

1.5

Life

Leroy

Simple

Tyan

Msort Pia

Lexg

enFib

Fib32 PI

PQueens

Takc

Taku

SingElim NoSingElim


Component sizes (rel)

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Basis Lib Arg Link Bench



Component sizes (abs)

0.00

2000.00

4000.00

6000.00

8000.00

10000.00

12000.00

14000.00

16000.00

18000.00

asm_i.tali 50.17 4.45 0.11 0.00 1.96

asm_e.tali 6917.00 3731.66 91.22 0.00 1718.69

obj.to 8953.31 5658.56 117.81 7264.31 2787.91

obj.o 1183.08 1005.05 11.09 46.05 988.47



Type overhead (relative)

0.00

100.00

200.00

300.00

400.00

500.00

600.00

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106 113 120 127 134



Normalized runtimes

0

2

4

6

8

10

12

14

16

Life Leroy Simple Tyan Msort Pia Lexgen Taku Takc PQueens Fib Fib32 PI

TILT TILT (Whole) NJ110.42 MLTON


Benchmark sizes (abs)

0.00

100.00

200.00

300.00

400.00

500.00

600.00

RU

N

TIM

EA

ND

RU

N

Takc

Taku P

I

fib

fib32

Isor

t

mso

rt

Qui

ckso

rt2

Qui

ckso

rt

btim

es fft

Tim

eAnd

Run

PQ

ueen

s

Bar

nesH

ut life

frank

lero

y

sim

ple

tyan pia

lexg

en

boye

r



Assembly vs object sizes (abs)

0.00

2000.00

4000.00

6000.00

8000.00

10000.00

12000.00

14000.00

16000.00

asm.tal 13740.51 12948.89 155.55 188.54 6963.54

obj.o+obj.to 10136.38 6663.61 128.90 7310.36 3776.37



Static reps

Kind of static representations of types:

Static type representations:


Interpretation function

Maps reps to the types being represented

Note:


Static reps

Define type constructor for optimized arrays

Uses case analysis instead of Typecase


Dynamic case

Define term constructor for optimized arrays


Type erasure

• Wait!! Still branching on types!– Wanted to get rid of “Typecase”, just

replaced it with “case” on types!• Clever trick:

– Encode type sums as term sums– Use type refinement to reflect term info

back into type level• Replace “case” on types with “case”

on terms.


MIL Types• Type size controlled using definitions

– Let a be c in E– Singleton kinds: a::S(c)

• Advantages– Optimizer improves sharing.– Sharing is intrinsic in the calculus– Needed for efficient compilation anyway

• Disadvantages– Equality is contextual, not syntactic– Massive duplication of mechanism


Complete Larger benchmarks

• Isort – insertion sort (lists)• Msort – merge sort (lists)• Quicksort – quick sort (arrays)• fft – fast fourier transform• pqueens – p queens problem (array intensive)• barnes hut – n body simulation• life – game of life• frank – small theorem prover• leroy – knuth bendix completion• simple – spherical fluid dynamics• tyan – grobner basis calculation• Pia – perspective inversion algorithm (image processing)• lexgen – lexical-analyzer generator• boyer – theorem proving


Standard ML

• Type/memory safe.• General purpose language.

– Mutable data, standard libraries, GC.• Advanced features.

– Polymorphism (templates/generics).– First class functions (inner classes).– Modules and interfaces.– Type inference.

• Language features of tomorrow’s industrial languages.


LIL

• LIL: Low-level internal language– Based on LX (Crary & Weirich)

• Data representation explicit• Still lambda calculus-ish

– Call/return (not CPS)• All heap allocation explicit


Engineering benefits

• Compiler can largely ignore types– No need to optimize– Equality is syntactic

• Type size controlled by separate mechanisms– Hash-consing– Higher-order abstract syntax– At TAL level, via ad-hoc definitions


New Challenges

• Controlling type size!– Types can be very large as trees.– Must maintain/traverse DAG structure.

• Must optimize types.– Types exist at runtime.– Inlining, CSE, etc must be done on types.

• Compiler must maintain well-typedness.


Libraries and linking

• Basis: Standard ML Basis library– provides basic language and OS functionality

• SML/NJ Lib– provides extended data structures

• Arg– command line processing

• Link– Compiler generated link unit


Compilation (over-simplified)

TILT

fib.sml

fib/obj.tofib/obj.o

TALx86fib/asm.tal


Example: Unknown Types.Implements


Example: Uniform Data Rep.

0.0x

0x =


LIL Type Analysis• Neat trick: typecase implemented at

the term level!– All runtime data exists as ordinary terms.– All control flow branches are on terms.– Types can be erased before running.

• Accomplished via type refinement– Kind structure tracks type/rep connection.

• Too involved for this talk– Detailed development in my dissertation

Documents

Certifying Compilation for Standard ML in a Type Analysis Framework