50
1 Just in time compilation (JIT) Virgil Palanciuc

Javascript Optimization and Just in Time Compilation

  • Upload
    lig-riv

  • View
    81

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Javascript Optimization and Just in Time Compilation

1

Just in time compilation (JIT)

Virgil Palanciuc

Page 2: Javascript Optimization and Just in Time Compilation

2

Just-in-Time Compilation

Compiler runs at program execution time Popularized by Java virtual machine implementations

Preserve interpretive character – portability

Challenge: Minimize the sum of program’s compile time and execution time Continuous Compilation (Idea: Mix interpretation and compilation)

Instead of pausing to compile, simply fire up the interpreter In the background, start compiling code, with emphasis on compilation units that

we interpret often When code is compiled, jump to the native code instead

Smart Just-in-Time

Estimate whether compilation is actually worthwhile Estimate compile time as a function of code size Observe time spent interpreting code If compilation is ‘worthwhile’ (heuristic), stop interpreting and compile instead

No second processor

Page 3: Javascript Optimization and Just in Time Compilation

3

Java’s Programming Environment

Java source code compiles to Java bytecode

Bytecode is verifiably secure, compact, and platform independent

Virtual machine for the target platform executes the program

Java source code

Java bytecode

network (?)

output

compiler

JIT compiler

native code

native exec

Page 4: Javascript Optimization and Just in Time Compilation

4

Java Summary and conclusions

JIT compilation is effective Reduces overall running time when compared to interpreter

Handles dynamic class loading

Register allocation extremely important

“Pretty good” is usually good enough Simple register allocation schemes are just as good as complex

ones

Just-in-time compilation is usually better than interpretation

Simple versions of standard algorithms provide most of the benefit Especially register allocation!

Page 5: Javascript Optimization and Just in Time Compilation

5

Introducing: dynamic languages

Have been around for quite a while

Perl, Javascript, Python, Ruby, Tcl

Popular opinions Unfixably slow

Not possible to create good IDE tools

Maintenance traps as codebase grows larger

Observation: techniques for creating tools for dynamic languages are similar to those for improving performance

Page 6: Javascript Optimization and Just in Time Compilation

6

Why are dynamic languages slow?

Lack of effort Used for “scripting” – i.e. I/O bound.

Hard to compile with traditional techniques Object & variable type can change

Methods can be added/removed

Target machine feature mismatches

Example C –method inline

C++ dynamic method dispatch – more difficult

Javascript – even more difficult (method lookup in dictionary)

Page 7: Javascript Optimization and Just in Time Compilation

7

Dynamic = lack of performance?

Warning: Controversial slide

Technology has evolved Widespread belief that Java is just as fast as C++

Javascript – Google V8, SpiderMonkey/TraceMonkey

Cultural problem – programmers often prefer to micro-optimize Actual design and system perspective requires more thought, micro-optimization is

“accessible”

Global optimizations always trump benchmarks! Java: slower than C++ in benchmarks, faster overall, especially on multicore/SMT

Ruby on rails – 20% faster than Struts, although Ruby way slower than Java

Page 8: Javascript Optimization and Just in Time Compilation

8

Case study: Javascript

At a glance: Java-like syntax, prototype based OOP

Not class-based! OOP types: prototype, class/single dispatch, class/multiple dispatch Prototype: Javascript, no “class”, each object has its own “class” Single-dispatch: C++, Java (virtual methods; based on ‘this’) Multiple dispatch: based on all arguments

Lexical scoping, 1st class functions, closures

ECMAScript edition 4: optional types

Ajax caused popularity surge Sudden focus on improving performance

“Browser war” – javascript interpreters get faster and faster

Page 9: Javascript Optimization and Just in Time Compilation

9

Efficient JIT compilation

Must think differently

Trick is to use heuristics Profile to get “probabilistic information”

Speculate on “what happens frequently”

Apply simple/fast optimization on the frequent dynamic types/values

Inlining -> “Polymorphic inline cache”

Heuristics can apply to the internal representation TraceMonkey, Firefox’s response to Chrome’s V8

Page 10: Javascript Optimization and Just in Time Compilation

10

1st step – bytecode execution

Traditional interpreters - abstract syntax tree walkers parse the program into a tree of statements and

expressions.

visit the nodes in the tree, performing their operations and propagating execution state

Bytecode interpreters Eliminate nodes that represent just “syntax structure”

Bring the representation closer to the machine

Enable additional optimizations on bytecode (as done on the JVM)

Page 11: Javascript Optimization and Just in Time Compilation

11

Google V8

Google V8 is the JS engine from Google Chrome

3 key areas to V8’s performance: Efficient garbage collection

“stop the world” GC – eliminates synchronization needs, reduces complexity

“generational” GC, 2 generations – rapidly delete short-lived objects

Fast property access

Computes “hidden classes” – more on next slides

Dynamic Machine Code Generation

V8 skips bytecode, generates machine code

On initial execution, determine hidden class

Patch inline cache code to use it

point.x

# ebx = the point object cmp [ebx,<hidden class offset>],<cached hidden class> jne <inline cache miss> mov eax,[ebx, <cached x offset>]

Page 12: Javascript Optimization and Just in Time Compilation

12

V8 – fast property access

C++, Java - faster because an object’s class/memory layout is known fixed offset to access a property – typically single memory load to read/write property

JavaScript - properties can be added to/deleted from objects on the fly Layout unknown ‘apriori’ – typically, “hash lookup” to find property’s memory location

Idea: objects don’t really change that much, use “hidden classes” to ‘cache’ the memory layout

Not new idea, used first in Self (at SUN; 1989! )

Example:

function Point(x, y) {this.x = x;this.y = y;

}

Page 13: Javascript Optimization and Just in Time Compilation

13

Hidden class creation

New ‘Point ‘created:

this.x = x;

Page 14: Javascript Optimization and Just in Time Compilation

14

Hidden class creation (cont’d)_

this.y = y:

Page 15: Javascript Optimization and Just in Time Compilation

15

Hidden class reuse

If another Point object is created: initially the Point object has no properties so the newly created object refers to the intial

class C0.

when property x is added, V8 follows the hidden class transition from C0 to C1 and writes the value of x at the offset specified by C1.

when property y is added, V8 follows the hidden class transition from C1 to C2 and writes the value of y at the offset specified by C2.

The runtime behavior of most JavaScript programs results in a high degree of structure-sharing using the above approach.

Two advantages of using hidden classes: Property access does not require a dictionary lookup

Enables inline caching.

Page 16: Javascript Optimization and Just in Time Compilation

16

Inline caching

Extremely efficient at optimizing virtual method calls Obj.ToString() – must know “Obj” in order to be able to call “ToString”

Initially, call site is “uninitialized”, method lookup is performed

On first call, remmember obj type (i.e. “ToString” address), change to “monomorphic” Always perform the call directly, as long as object type does not change

If object type chages, switch back to “uninitialized”

What about this case?

Solution: keep a limited number of different types (e.g. 2) switch state from “monomorphic” to “polymorphic”

If even more object types occur, change to “megamorphic” and disable inline caching

var values = [1, "a", 2, "b", 3, "c", 4, "d"]; for (var value in values) { document.write(value.toString()); }

Page 17: Javascript Optimization and Just in Time Compilation

17

Firefox’s response: TraceMonkey

Incremental improvement over a ‘bytecode interpreter’ But does not preclude dynamic code generation

Uses “trace trees” instead of “control flow graph” as the internal representation for the program structure Basically, it “watches for” commonly-repeated actions, and optimize the “hot paths”

Easy to perform function inlining

Easier to perform type inference (e.g. determine whether “a+b” is “string concatenation” or “number addition”)

Looping overhead grossly diminished

Firefox also added some “polymorphic property caching” But only for prototype objects?

Tracemonkey is based on Adobe’s Tamarin:Tracing

Page 18: Javascript Optimization and Just in Time Compilation

18

Traditional CFG

Page 19: Javascript Optimization and Just in Time Compilation

19

Trace tree

Page 20: Javascript Optimization and Just in Time Compilation

20

Trace trees (cont’d)

Constructed and extended lazily

Designed to represent loops (performace-critical parts)

Anchor (loop header) discovered by dynamic profiling, not static analysis

Can include instructions from called methods (“inlining”)

Can have side exits Restore VM/interpreter state

Resume interpretation

Page 21: Javascript Optimization and Just in Time Compilation

21

Trace trees – loop nest

Page 22: Javascript Optimization and Just in Time Compilation

22

Trace trees – compilation

Optimization – greatly simplified, trace is effectively in SSA form just by renaming With an exception – can you see it?

Register allocation Traverse all traces in reverse recording order

Guarantee that all uses are “seen” before the definition

Use a simple scheme to allocate registers

Type specialization Speculate on variable type based on historic info

Insert guards to go to “interpreted” mode if type changes

Page 23: Javascript Optimization and Just in Time Compilation

23

Results

Page 24: Javascript Optimization and Just in Time Compilation

24

How about tools?

Modern IDE expectations: autocomplete, jump-to-definition, browsing, refactoring

First hints: syntax var x = 12.5;

var y = { a:1, b:2};

function foo(a, b) { return a+b;}

Next hints: inference var x = 12.5; var y=x;

Can apply standard techniques used for optimization!

Common idioms/ coding style / jsdoc comments

Type inference?

Page 25: Javascript Optimization and Just in Time Compilation

25

Tool support for dynamic languages

What you can’t solve deterministically, solve probabilistically Make assumptions based on what you know (e.g., variable types don’t change)

Monte Carlo methods?

It’s not the end of the world if you’re wrong

How about refactoring/rename? Java/.NET can’t do it perfectly, either! (how about dynamic class loading/reflection?

Struts/XML configuration files? DB persistence layers?)

Conclusions: Still plenty of low-hanging fruit in the area of dynamic language research

Can apply optimization theory to IDEs

Page 26: Javascript Optimization and Just in Time Compilation

26

Page 27: Javascript Optimization and Just in Time Compilation

27

Credits

Presentation assembled mainly from

http://code.google.com/apis/v8/design.html

http://andreasgal.com/

Steve Yegge’s speech & Stanford EE Dept Computer Systems Coloqium, May 2008

http://steve-yegge.blogspot.com/2008/05/dynamic-languages-strike-back.html

Jeremy Condit presentation (CS 265 Expert Topic, 23 April 2003)

http://www.cs.berkeley.edu/~jcondit/cs265/expert.html

Page 28: Javascript Optimization and Just in Time Compilation

28

BACKUP SLIDES

Page 29: Javascript Optimization and Just in Time Compilation

29

The Java Virtual Machine

Each frame contains local variables and an operand stack

Instruction set Load/store between locals and operand stack

Arithmetic on operand stack

Object creation and method invocation

Array/field accesses

Control transfers and exceptions

The type of the operand stack at each program point is known at compile time

Page 30: Javascript Optimization and Just in Time Compilation

30

Java Virtual Machine (cont’d)

Example:

iconst 2iload aiload biaddimulistore c

Computes: c := 2 * (a + b)

Page 31: Javascript Optimization and Just in Time Compilation

31

Example:

iconst 2iload aiload biaddimulistore c

Computes: c := 2 * (a + b)

Java Virtual Machine (cont’d)

a

b

c

42

7

0

Page 32: Javascript Optimization and Just in Time Compilation

32

Example:

iconst 2iload aiload biaddimulistore c

Computes: c := 2 * (a + b)

Java Virtual Machine (cont’d)

a

b

c

42

7

0

2

Page 33: Javascript Optimization and Just in Time Compilation

33

Java Virtual Machine (cont’d)

a

b

c

42

7

0 42

2

Example:

iconst 2iload aiload biaddimulistore c

Computes: c := 2 * (a + b)

Page 34: Javascript Optimization and Just in Time Compilation

34

Java Virtual Machine (cont’d)

a

b

c

42

7

0

7

42

2

Example:

iconst 2iload aiload biaddimulistore c

Computes: c := 2 * (a + b)

Page 35: Javascript Optimization and Just in Time Compilation

35

Java Virtual Machine (cont’d)

a

b

c

42

7

0 49

2

Example:

iconst 2iload aiload biaddimulistore c

Computes: c := 2 * (a + b)

Page 36: Javascript Optimization and Just in Time Compilation

36

Java Virtual Machine (cont’d)

a

b

c

42

7

0

98

Example:

iconst 2iload aiload biaddimulistore c

Computes: c := 2 * (a + b)

Page 37: Javascript Optimization and Just in Time Compilation

37

Java Virtual Machine (cont’d)

a

b

c

42

7

98

Example:

iconst 2iload aiload biaddimulistore c

Computes: c := 2 * (a + b)

Page 38: Javascript Optimization and Just in Time Compilation

38

Lazy Code Selection

Introduced by Intel Vtune JIT compiler ([Adl-Tabatabai 98])

Idea: Use a mimic stack to simulate the execution of the operand stack

Instead of the actual values, the mimic stack holds the location of the values

Page 39: Javascript Optimization and Just in Time Compilation

39

Lazy Code Selection (cont’d)

Each operand on the stack is an element from a class hierarchy

Operand

Immediate Memory Register FP Stack

Field Array Static Stack Constant

Page 40: Javascript Optimization and Just in Time Compilation

40

Lazy Code Selection (cont’d)

Example:

iconst 2

iload a

iload b

iadd

imul

istore c

a

b

c

Reg eax

Stack 4

Stack 8

Page 41: Javascript Optimization and Just in Time Compilation

41

Lazy Code Selection (cont’d)

Example:

iconst 2

iload a

iload b

iadd

imul

istore c

a

b

c

Reg eax

Stack 4

Stack 8

Imm 2

Page 42: Javascript Optimization and Just in Time Compilation

42

Lazy Code Selection (cont’d)

Example:

iconst 2

iload a

iload b

iadd

imul

istore c

a

b

c

Reg eax

Stack 4

Stack 8 Reg eax

Imm 2

Page 43: Javascript Optimization and Just in Time Compilation

43

Lazy Code Selection (cont’d)

Example:

iconst 2

iload a

iload b

iadd

imul

istore c

a

b

c

Reg eax

Stack 4

Stack 8

Stack 4

Reg eax

Imm 2

Page 44: Javascript Optimization and Just in Time Compilation

44

Lazy Code Selection (cont’d)

Example:

iconst 2

iload a

iload b

iadd

imul

istore c

a

b

c

Reg eax

Stack 4

Stack 8 Reg ebx

Imm 2

movl ebx, eaxaddl ebx, 4(esp)

Page 45: Javascript Optimization and Just in Time Compilation

45

Lazy Code Selection (cont’d)

Example:

iconst 2

iload a

iload b

iadd

imul

istore c

a

b

c

Reg eax

Stack 4

Stack 8

Reg ebx

movl ebx, eaxaddl ebx, 4(esp)sall ebx, 1

Page 46: Javascript Optimization and Just in Time Compilation

46

Lazy Code Selection (cont’d)

Example:

iconst 2

iload a

iload b

iadd

imul

istore c

a

b

c

Reg eax

Stack 4

Stack 8

movl ebx, eaxaddl ebx, 4(esp)sall ebx, 1movl 8(esp), ebx

Page 47: Javascript Optimization and Just in Time Compilation

47

Lazy Code Selection (cont’d)

Achieves several results Converts stack-based architecture to register-based architecture

Folds computations into more complex x86 instructions

Allows additional optimizations Strength reduction and constant propagation

Redundant load-after-store

Disadvantages Extra operands spilled to stack after basic blocks

Page 48: Javascript Optimization and Just in Time Compilation

48

Exception Handling

Problem: Exceptions aren’t thrown often, but they complicate control flow

Solution: On-demand exception translation Maintain a mapping of native code addresses to original bytecode addresses

When an exception occurs, look up the original address and jump to the appropriate exception handler

Results in less compilation overhead and bigger basic blocks

Page 49: Javascript Optimization and Just in Time Compilation

49

Exception Handling (cont’d)

Problem: Exceptions are not always rare

Solution: Inlining Eliminate exceptions when possible

try {

throw new MyException();

} catch (MyException e) {

}

Use method inlining to create more opportunities

Page 50: Javascript Optimization and Just in Time Compilation

50

AIX JDK [Ishizaki 99]

JIT compiler for PowerPC (32-bit RISC)

Contributions: Null check elimination

Array bounds check elimination

Global common subexpression elimination

Type inclusion test optimization

Static method call inlining

Dynamic method call resolution

…none of which matter very much

Each optimization is fairly effective Over 50% of run-time null checks eliminated

But overall effects are relatively small At most 10% improvement in overall execution time