62
YETI: Gradually Extensible T race Interpreter Mathew Zaleski, University of Toronto [email protected] (thesis proposal)

YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

YETI: Gradually Extensible Trace

InterpreterMathew Zaleski,

University of Toronto

[email protected]

(thesis proposal)

Page 2: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006 2

Overview

‣Introduction• Background• Efficient Interpretation• Our Approach to Mixed-Mode Execution• Results and Discussion

Page 3: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Why so few JIT compilers?

• Complex JIT infrastructure built in “big bang”, before any generated code can run.

• Rather than incrementally extend the interpreter, typical JITs is built alongside.• The code generator of current JIT compilers

makes little provision to reuse the interpreter.• The method-orientation of most JITs means

that cold code is compiled with hot.‣ Interpreters should be more gradually extensible

to become dynamic compilers.

3

Page 4: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Problems with current practice

• Packaging of virtual instruction bodies is:• Inefficient: Interpreters slowed by branch

misprediction• Non-reusable: JIT compilers must implement

all virtual instructions from scratch• Method orientation of a JIT compiler forces it to

compile cold code along with hot.• Code compiled cold requires complex runtime

to perform late binding if it runs.• Recompiling cold code that becomes hot

requires complex recompilation infrastructure.

4

Page 5: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Our Approach

• Branch prediction problems of interpretation can be addressed by calling the virtual bodies.• Can speed up interpretation significantly.• Enables generated code to call the bodies.• JIT need not support all virtual instructions.

• Complexity of compiling cold code can be side stepped by compiling dynamically selected regions that contain only hot code.• We describe how compiling traces allows us to

compile only hot code and link on newly hot regions as they emerge.‣ Enables gradual enhancement of interpreter

5

Page 6: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Overview of Contribution

• Callable bodies make for efficient interpretation.

• Reuse of callable bodies from generated code smooths “big bang”.

• A trace oriented JIT compiler is a simple and promising architecture.

6

interpretation

efficiency challenges

callable bodies

method based JIT

big bang development

reuse callable bodies/traces

YETI

Page 7: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006 7

Gradual extension of VM

Larger regions. More instructions compiled.Size and Complexity of Compiled Code Regions

Traces - sub dispatchx

Basic Blocks

x

SUBinterpreterx

Traces - compile all virtual instructions x

Virtu

al in

stru

ctio

ns

com

pile

d

Traces- just integer instructions x✔

Page 8: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Result preview - Efficient Interpretation

• Branch misprediction dealt with by calling the bodies from region of generated code.

• Relative to Direct Threaded VM

• Geo mean• Java SpecJVM98

benchmarks• Ocaml benchmarks

8

0

0.2

0.4

0.6

0.8

1.0

java/P

4

java/P

PC

ocam

l/P4

ocam

l/PPC

0.63

0.900.820.81

Subroutine Threading

Rela

tive

to D

irect

Thr

eadi

ng

Page 9: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Result preview - Trace based JIT

• Geom mean SPECJVM98 relative to Sun Hotspot JIT

• SABVM• Selective inlining

• Modified JamVM.• TR-LINK = traces,no JIT• JIT = trace, JIT• Only 50 integer

bytecodes• Promising start

9

0

1

2

3

4

5

6

SABV

M

TR-LIN

K JIT

4.3

5.75.9

Java/PPC970

Rela

tive

to S

un H

otsp

ot

Page 10: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Background & Related Work

10

Ertl & Gregg Branch misprediction

Piumarta & Riccardi Selective inlining

Parrot (perl6) Callable bodies

Vitale, Abdelrahman Catenation, Tcl

Bala, Duesterwald, Banerjia Dynamo

Bruening, Garnett ,Amarasinghe Dynamo Rio

Whaley Partial methods

Gal, Probst, Franz Hotpath, Trace-based JIT

Suganuma,Yasue,Nkatani Region based compilation

Hozle, Chambers, Ungar Self

Many Java, JVM and JIT authors Java

Page 11: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006 11

Overview

•Introduction‣ Background:‣ Dynamo & Traces‣ Interpretation

• Our Approach to Mixed-Mode Execution• Results and Discussion

Page 12: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006 12

• Trace-oriented dynamic optimization system.• HP PA-8000 computers.

• Counter-Intuitive approach:• Don’t execute optimized binary interpret it.• Count transits of reverse branches.• Trace-generate (next slide).• Dispatch traces when encountered.

• Soon, most execution from trace cache.• faster than binary on hardware of the day!

HP Dynamo

Page 13: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

• Trace is path followed by program

• Conditional branches become trace exits.

• Do not expect trace exits to be taken.

Trace with if-then-else//c => b2if (c) b1;else b2;b3;

c

b1 b2

b3

c

b2b3

texit b1

13

Page 14: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006 14

Overview

•Introduction‣ Background:

• Dynamo & Traces‣ Interpretation

• Efficient Interpretation • Our Approach• Selecting Regions• Results and Discussion

Page 15: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

int f(boolean); Code: 0: iload_1 1: ifeq 7 4: bipush 42 6: ireturn 7: iconst_0 8: ireturn

15

int f(boolean parm){ if (parm){ return 42; }else{ return 0; } }

Java Source

Java Bytecode

Javac compiler

Virtual Program

Page 16: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006 16

Interpreter

LoadedProgram

Bytecodebodies

Internal Representation

fetch

dispatch LoadParms

execute

Execution Cycle

Page 17: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

vPC = internalRep;

while(1){ switch(*vPC++){

//and many more..

}};

17

Switched Interpreter

case iload_1: ..

break;

case ifeq: ..

break;

slow. Burdened by switch and loop overhead.

Page 18: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

static int *vPC = internalRep;

interp(){ while(1){ (*vPC)(); } };

18

Call Threaded Interpreter

void iload_1(){ //push load 1 vPC++; }

slow. burdened by function pointer call

void ifeq(){ //change vPC vPC++; }

Page 19: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Direct Threaded Interpreter

‣ Good: one dispatch branch taken per body19

ifeq: if () vPC= goto *vPC++;

bipush: .. goto *vPC++;

iload_1: .. goto *vPC++;

iconst_0: .. goto *vPC++;

ireturn: .. goto *vPC++;

ireturn: .. goto *vPC++;

-Execution of virtual program “threads” through bodies

int f(boolean); Code: 0: iload_1 1: ifeq 7 4: bipush 42 6: ireturn 7: iconst_0 8: ireturn

Page 20: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Context Problem

‣ Bad: hardware has no context to predict dispatch20

ifeq: if () vPC= goto *vPC++;

bipush: .. goto *vPC++;

iload_1: .. goto *vPC++;

iconst_0: .. goto *vPC++;

ireturn: .. goto *vPC++;

ireturn: .. goto *vPC++;

int f(boolean); Code: 0: iload_1 1: ifeq 7 4: bipush 42 6: ireturn 7: iload_1 8: ireturn

Virtual PC predicts destination.

Hardware PC insufficient context

Page 21: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006 21

Overview

✓ Introduction✓ Background‣ Efficient Interpretation• Our Approach to Mixed-Mode Execution• Results and Discussion

Page 22: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

int f(boolean); Code: 0: iload_1 1: ifeq 7 4: bipush 42 6: ireturn 7: iconst_0 8: ireturn

22

int f(boolean parm){ if (parm){ return 42; }else{ return 0; } }

Java Source

Java Bytecode

Javac compiler

Virtual Program

Page 23: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

ifeq: if () vPC= goto *vPC++;

bipush: .. goto *vPC++;

23

Direct Threaded Interpreter

…iload_1ifeq 7bipush 42ireturniconst_0ireturn…

DTT - DirectThreading Table

VirtualProgram

vPC iload_1: .. goto *vPC++;

DTT maps vPC to implementation

C implementationof each body

DTT&&iload_1&&ifeq

+4&&bipush

42&&ireturn&&iconst_0&&ireturn

Page 24: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Context Problem

‣ Bad: hardware has no context to predict dispatch24

ifeq: if () vPC= goto *vPC++;

bipush: .. goto *vPC++;

iload_1: .. goto *vPC++;

iconst_0: .. goto *vPC++;

ireturn: .. goto *vPC++;

ireturn: .. goto *vPC++;

int f(boolean); Code: 0: iload_1 1: ifeq 7 4: bipush 42 6: ireturn 7: iload_1 8: ireturn

Virtual PC predicts destination.

Hardware PC insufficient context

Page 25: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006 25

Essence of Subroutine Threading

iload_1: .. asm(ret);

ifeq: vPC=.. goto *vPC;

Context Threading Table

Package bodies as subroutines and call them

CTT

call iload_1

call ifeq

call bipush

call ireturn

call iconst_0

call ireturn

DTT

4

42

ret terminated

virtual branches as directpoints to generated

code

Page 26: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006 26

Context Threading (CT) -- Generating specialized code in CTT

4

DTT

Specialized bodies can also be generated in CTT!

vPCcall iload_1

subl

movl ;pop stack

cmpl ;compare 0

jne

movl ; vPC =

jmp ; else

addl ; vPC +=

call bipush

call ireturn

call iconst_0

call ireturn

inlined code for if_eq

context for conditional branches

Page 27: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

CT Performance

‣ CT is an efficient interpretation technique27Context Threading 24

!"#$%&'()*+',#*-.#(,#/)0

0.00

0.25

0.50

0.75

1.00

java

/p4

java

/ppc

ocam

l/p4

ocam

l/ppc

Subroutine Branch Inlining Tiny Inlining

!"#$%&'()*+,"+

-'#).,+/0#)%*'12

! 1%2*&#$3)'4%#*'5*#66#$&'7#*/)8*.#)#2/9*

Page 28: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006 28

Overview

✓ Introduction✓ Background: Interpretation & traces✓ Efficient Interpretation‣ Our Approach to Mixed-Mode Execution

• Selecting Regions• Results and Discussion

Page 29: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Gradually Extensible Trace Interpreter

Three main enablers:1. Bodies organized as callable routines so

executable regions can efficiently mix compiled code and dispatched bodies.

2. The DTT can point to variously shaped execution units.

3. Efficient, flexible instrumentation.

29

Page 30: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

1. Bodies are callable

‣ Needn’t build all virtual instructions all in one shot.

30

iload_1: ..ret;

istore: ..ret;

call iload_1call iload_1specialized

code

for iadd

call istore_1

Packaging bytecode bodies as lightweight subroutines means that they can be efficiently called from generated code

Page 31: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

2. DTT always points to implementation

‣ DTT indirection enables any shape of execution unit to be dispatched.

31

iload_1: ..ret;

istore: ..ret;

call iload_1call iload_1JIT compiled

codefor iadd

call istore_1

DTT

..of corresponding execution unitvPC

Page 32: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

3. Flexible, Efficient Instrumentation

‣ Our runtime active before and after each dispatch

32

A dispatcher describes an execution unit

DTT

postworker(){}

preworker(){}

dispatcherbodypayloadpre post

while(1){ //dispatch loop d = vPC->dipatcher; pay = d->payload; (*d->pre)(vPC,pay,&tcs); (*d->body)(); (*d->post)(vPC,pay,&tcs); }

body or generated code for region

payload data specific to execution unit

Page 33: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006 33

Overview

✓ Introduction✓ Background: Interpretation & traces• Our Approach to Mixed-Mode Execution

• Selecting Regions‣ Basic Blocks• Traces

• Results and Discussion

Page 34: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

int f(boolean); Code: 0: iload_1 1: ifeq 7 4: bipush 42 6: ireturn 7: iconst_0 8: ireturn

34

int f(boolean parm){ if (parm){ return 42; }else{ return 0; } }

Java Source

Java Bytecode

Basic Block Detection

Page 35: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

t_dispatcherbody=&&iload_1payloadpre post

t_dispatcherbody=BB0 payload=bb0pre post

35

Basic Block Detection

0: iload_1 1: ifeq 7 4: bipush 42 6: ireturn 7: iconst_0 8: ireturn

t_thread_context historyList recordMode=0

iload_1 ifeq

DTT

t_dispatcherbody=&&ifeqpayloadpre post

1

while(1){ //dispatch loop d = vPC->dipatcher; pay = d->payload; (*d->pre)(vPC,pay,&tcs); (*d->body)(); (*d->post)(vPC,pay,&tcs); }

Page 36: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Generated code for a basic block

• Could JIT the bb, currently we generate “subroutine threading style” code for it.

36

DTTiload_1ifeq4

iconst_0ireturn

bb1call iconst_0

call ireturn

ret

bb0call iload_1

call if_eq

ret

Basic block is a run-time superinstruction

bb0

bb1

t_dispatcherbodypayloadprepost

t_dispatcherbodypayloadprepost

Page 37: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

• all in one C function• thread private are C

local variables• loader initializes

DTT to static dispatchers

• dispatch loop calls instrumentation specific to dispatcher

C Code for interp function

37

static t_dispatcher init[256]={};interp(t_dispatcher *dtt){ t_dispatcher *vPC = dtt; t_thread_context tcs; iload: //real work of iload here.. vPC++; //to next instruction asm volatile(“ret”); iadd: //and many more bodies //dispatch loop while(1){ d = vPC->dispatcher; pay = d->payload; (*d->pre)(vPC,pay,&tcs); (*d->body)(); (*d->post)(vPC,pay,&tcs); }

Page 38: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006 38

Overview

✓ Introduction✓ Background: Interpretation & traces✓ Our Approach to Mixed-Mode Execution

• Selecting Regions✓ Basic Blocks‣ Traces

• Results and Discussion

Page 39: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Detecting Traces

• Use Dynamo’s trace detection heuristic.

• Instrument reverse branches until they are hot.• Postworker of basic block dispatcher

• Trace generate starting from hot reverse branch:• Much like bb’s were recorded• Postworker of each basic block region adds

each bb to thread private history list.• Eventually creates new trace dispatcher• Hold off generating code until after trace has

“trained” a few times..39

Page 40: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

• Trace is path followed by program

• Conditional branches become trace exits.

• Do not expect trace exits to be taken.

Trace with if-then-else//c => b2if (c) b1;else b2;b3;

c

b1 b2

b3

c

b2b3

texit b1

40

Page 41: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Subroutine threading style code for a Trace

• Dispatch virtual instructions for trace

41

DTTiload_1ifeq4

iconst_0ireturn

trace-b0-b1

call iload_1

trace_exit_eq

call iconst_0

trace_exit_iret

..caller code..

...

Trace is super-super instruction

bb0

bb1

trace exit handlers

TEH..

ret

TEH..

ret

Page 42: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Trace Exits

(a) Two-way: from conditional branches

• one leg on trace• other leg off trace

(b) Multiway: from invokes and returns

• one leg on trace• potentially many legs off trace

42

trace-b0-b1

call iload_1

trace_exit_eq

call iconst_0

trace_exit_iret

...

(a)

(b)

Page 43: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Trace Exit Handlers

• Code runs when trace exit is taken before return to interpreter

• Record which trace exit has occurred in thread context

• Return to dispatch loop• Housekeeping roles:• flush state of JIT code• Trace linking

43

trace-b0-b1

call iload_1

trace_exit_eq

call iconst_0

trace_exit_iret

ret

TEHtcs=..

ret

TEHtcs=..

ret

Page 44: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Trace linking

• When trace exit is hot and destination is a trace

• Rewrite ret at end of trace exit handler as jmp to destination trace• Only use of code

rewriting in system

44

trace-1

TEH

trace-2

retjmp

Hot trace exit

Destination Trace

Page 45: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Trace JIT

• Generate native code for trace exits• A lot like branch inlining from CT system.

• Optimize invokevirtual when call and return occur in same trace.

• Naive register allocation scheme• Only handle 50 integer/object virtual instructions• Do virtual instructions one-by-one• Relatively easy debugging

• Floating point should be easy.

45

Page 46: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006 46

Overview

✓ Introduction✓ Background: Interpretation & traces✓ Our Approach to Mixed-Mode Execution‣ Results and Discussion

• Data• Discussion• Remaining Work in this dissertation

Page 47: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Implementation

• Modify JamVM 1.1.3 to be SUB threaded• Gradually extend it to:• Detect, execute subroutine style basic blocks• Detect, execute subroutine style traces• Link traces• Compile traces.

47

Page 48: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Region Shape

• As execution units become larger• Trips around dispatch loop become less frequent• Next show data to justify “step back” approach.• Very simple experiment:• Modify dispatch loops to count iterations.

48

Condition DescriptionDCT Direct Call Threading

BB CT-style Basic BlocksTR Traces (no link, no JIT)

TR-LINK Linked Traces (no JIT)

Page 49: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Region Shape Effect on Dispatch Count

49

compress

db jack

javac

jess

mpeg

mtrt ray

scitest

geomean

1e0

1e2

1e4

1e6

1e8

1e10

log

disp

atch

cou

nt

Spec JVM98 Benchmarks

LegendDCTBBTRTR-LINK

Page 50: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

How efficient is profiling system?

• Run instrumentation without the JIT.• Are the intermediate versions of Java viable?• Include SUB threading in comparison:• Since it is an efficient dispatch technique.

• Report elapsed time relative to distro of JamVM

50

Page 51: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Profiling/Instrumentation Overhead

51

0.73

1.87

0.95

0.78

0.66

compress

0.87

1.45

1.05

0.90

0.80

db0.88

1.41

1.22

0.88

0.79

jack

0.88

1.39

1.18

0.96

0.85

javac

0.87

1.38

1.15

0.87

0.77

jess

0.72

2.00

0.99

0.78

0.72

mpeg

0.84

1.11

1.42

0.82

0.81

mtrt

0.88

1.49

1.33

0.84

0.77

ray

0.59

1.69

1.18

0.68

0.60

scitest

0.80

1.51

1.15

0.83

0.75

geomean

0.0

0.5

1.0

1.5

2.0

Elap

sed

time

rela

tive

to ja

m-d

istro

Spec JVM98 Benchmarks

LegendSUBDCTBBTRTR-LINK

Page 52: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Performance of simple JIT

• Compare YETI performance WITHOUT JIT to selective inlining SableV

• Compare YETI with preliminary trace based JIT to Sun’s Hotspot optimizing compiler

• Not much basis for comparison to Hotpath

52

Condition DescriptionTR-LINK Linked TracesSABVM SableVM selective inlining

JIT Traces (JIT and Link)

Page 53: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

JIT Performance relative to Sun Hotspot

53

8.78

8.14

5.49

compress

2.31

1.93

1.50

db

3.06 3.23

2.59

jack

3.92

2.86

2.43

javac

5.73

5.17

4.25

jess

12.26 13.73

7.95

mpeg

11.65

11.96

11.87

mtrt

11.46

9.71

7.20

ray

4.075.21

3.45

scitest

5.94

5.69

4.31

geomean

0

2

4

6

8

10

12

14

Elap

sed

time

rela

tive

to h

otsp

ot

Spec JVM98 Benchmarks

LegendSABVMTR-LINKJIT

Page 54: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006 54

Overview

✓ Introduction✓ Background: Interpretation & traces✓ Our Approach✓ Selecting Regions• Results and Discussion✓ Data‣ Discussion (and future work)• Remaining Work in this dissertation

Page 55: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Gradual performance improvement

55

0

1

2

SABV

MSU

BDCT BB TR

TR-LIN

K JIT

0.57

0.750.83

1.15

1.51

0.800.78

Rela

tive

to J

amVM

1.3

.3 D

irect

Thr

eadi

ng

Geometric Mean SpecJVM98Elapsed Time PPC970

• Performance improves as effort invested.

• SUB very effective for lightweight bodies

• BB not viable by itself• TR-LINK about same

as CT/branch inlining.• JIT preliminary.

Page 56: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Discussion

• We have demonstrated how to build an interpreter that is simple and yet as efficient as SableVM and JamVM.

• We have shown that our interpreter can be gradually extended to identify, link and compile traces.

• We have shown how generated code can reuse callable virtual instruction bodies.

• Our JIT, although it has no optimizer, only supports 50 Java virtual instructions, improves performance by 24%.• More instructions, better performance?

56

Page 57: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006 57

2D vision of Incremental VM lifecycle..

SUBinterpreter

Basic Blocks Traces - sub dispatch

Have explored this spaceComplexity of Compiled Code Regions

Traces- just integer instructions

Traces - compile all virtual instructions x

x

Basic blocks - just integer virtual instructions

Num

ber o

f virt

ual in

stru

ctio

ns c

ompi

led

✔ ✔ ✔

Page 58: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Application

• If I had to build a new interpreter.• for “lightweight” bodies, so dispatch matters.

• Start with bodies that can be conditionally compiled to be either direct threaded or callable.

• Bring up the system using DCT because the dispatch loop makes it easier to debug.• e.g. Logging from dispatch loop is very helpful.

• Primary platforms would run SUB and secondary platforms would run direct threading.

• Gradually extend as described.

58

Page 59: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Future Work

• Work could go in many different directions.• Apply to another language system• JavaScript? Python? Fortress?• Deal with polymorphic bytecodes

• Extend JIT/Optimizer • Explore performance potential• Need a lot more infrastructure (e.g. IR)

• Package infrastructure for others to apply.

59

Page 60: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006 60

Overview

✓ Introduction✓ Background: Interpretation & traces✓ Our Approach✓ Selecting Regions• Results and Discussion✓ Data✓ Discussion‣ Remaining work in this dissertation

Page 61: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

Proposed Work for winter 2007.

• Infrastructure to measure:• Compile time;• Proportion of virtual instructions executed from

compiled code.• Add float register class.• scimark, ray, Linpack would likely benefit.

• Compile Basic Blocks• Long bb benchmarks will benefit.

• Write, write, write.

61

Page 62: YETI: Gradually Extensible Trace Interpretermatz/dissertation/tp.pdf · • Non-reusable: JIT compilers must implement all virtual instructions from scratch • Method orientation

Thesis Proposal Jan 2006

• Back 12• Efficient 23• Our 28• BB 33• TR 38• RES 46• CONC 54

62