Upload
lig-riv
View
81
Download
1
Tags:
Embed Size (px)
Citation preview
1
Just in time compilation (JIT)
Virgil Palanciuc
2
Just-in-Time Compilation
Compiler runs at program execution time Popularized by Java virtual machine implementations
Preserve interpretive character – portability
Challenge: Minimize the sum of program’s compile time and execution time Continuous Compilation (Idea: Mix interpretation and compilation)
Instead of pausing to compile, simply fire up the interpreter In the background, start compiling code, with emphasis on compilation units that
we interpret often When code is compiled, jump to the native code instead
Smart Just-in-Time
Estimate whether compilation is actually worthwhile Estimate compile time as a function of code size Observe time spent interpreting code If compilation is ‘worthwhile’ (heuristic), stop interpreting and compile instead
No second processor
3
Java’s Programming Environment
Java source code compiles to Java bytecode
Bytecode is verifiably secure, compact, and platform independent
Virtual machine for the target platform executes the program
Java source code
Java bytecode
network (?)
output
compiler
JIT compiler
native code
native exec
4
Java Summary and conclusions
JIT compilation is effective Reduces overall running time when compared to interpreter
Handles dynamic class loading
Register allocation extremely important
“Pretty good” is usually good enough Simple register allocation schemes are just as good as complex
ones
Just-in-time compilation is usually better than interpretation
Simple versions of standard algorithms provide most of the benefit Especially register allocation!
5
Introducing: dynamic languages
Have been around for quite a while
Perl, Javascript, Python, Ruby, Tcl
Popular opinions Unfixably slow
Not possible to create good IDE tools
Maintenance traps as codebase grows larger
Observation: techniques for creating tools for dynamic languages are similar to those for improving performance
6
Why are dynamic languages slow?
Lack of effort Used for “scripting” – i.e. I/O bound.
Hard to compile with traditional techniques Object & variable type can change
Methods can be added/removed
Target machine feature mismatches
Example C –method inline
C++ dynamic method dispatch – more difficult
Javascript – even more difficult (method lookup in dictionary)
7
Dynamic = lack of performance?
Warning: Controversial slide
Technology has evolved Widespread belief that Java is just as fast as C++
Javascript – Google V8, SpiderMonkey/TraceMonkey
Cultural problem – programmers often prefer to micro-optimize Actual design and system perspective requires more thought, micro-optimization is
“accessible”
Global optimizations always trump benchmarks! Java: slower than C++ in benchmarks, faster overall, especially on multicore/SMT
Ruby on rails – 20% faster than Struts, although Ruby way slower than Java
8
Case study: Javascript
At a glance: Java-like syntax, prototype based OOP
Not class-based! OOP types: prototype, class/single dispatch, class/multiple dispatch Prototype: Javascript, no “class”, each object has its own “class” Single-dispatch: C++, Java (virtual methods; based on ‘this’) Multiple dispatch: based on all arguments
Lexical scoping, 1st class functions, closures
ECMAScript edition 4: optional types
Ajax caused popularity surge Sudden focus on improving performance
“Browser war” – javascript interpreters get faster and faster
9
Efficient JIT compilation
Must think differently
Trick is to use heuristics Profile to get “probabilistic information”
Speculate on “what happens frequently”
Apply simple/fast optimization on the frequent dynamic types/values
Inlining -> “Polymorphic inline cache”
Heuristics can apply to the internal representation TraceMonkey, Firefox’s response to Chrome’s V8
10
1st step – bytecode execution
Traditional interpreters - abstract syntax tree walkers parse the program into a tree of statements and
expressions.
visit the nodes in the tree, performing their operations and propagating execution state
Bytecode interpreters Eliminate nodes that represent just “syntax structure”
Bring the representation closer to the machine
Enable additional optimizations on bytecode (as done on the JVM)
11
Google V8
Google V8 is the JS engine from Google Chrome
3 key areas to V8’s performance: Efficient garbage collection
“stop the world” GC – eliminates synchronization needs, reduces complexity
“generational” GC, 2 generations – rapidly delete short-lived objects
Fast property access
Computes “hidden classes” – more on next slides
Dynamic Machine Code Generation
V8 skips bytecode, generates machine code
On initial execution, determine hidden class
Patch inline cache code to use it
point.x
# ebx = the point object cmp [ebx,<hidden class offset>],<cached hidden class> jne <inline cache miss> mov eax,[ebx, <cached x offset>]
12
V8 – fast property access
C++, Java - faster because an object’s class/memory layout is known fixed offset to access a property – typically single memory load to read/write property
JavaScript - properties can be added to/deleted from objects on the fly Layout unknown ‘apriori’ – typically, “hash lookup” to find property’s memory location
Idea: objects don’t really change that much, use “hidden classes” to ‘cache’ the memory layout
Not new idea, used first in Self (at SUN; 1989! )
Example:
function Point(x, y) {this.x = x;this.y = y;
}
13
Hidden class creation
New ‘Point ‘created:
this.x = x;
14
Hidden class creation (cont’d)_
this.y = y:
15
Hidden class reuse
If another Point object is created: initially the Point object has no properties so the newly created object refers to the intial
class C0.
when property x is added, V8 follows the hidden class transition from C0 to C1 and writes the value of x at the offset specified by C1.
when property y is added, V8 follows the hidden class transition from C1 to C2 and writes the value of y at the offset specified by C2.
The runtime behavior of most JavaScript programs results in a high degree of structure-sharing using the above approach.
Two advantages of using hidden classes: Property access does not require a dictionary lookup
Enables inline caching.
16
Inline caching
Extremely efficient at optimizing virtual method calls Obj.ToString() – must know “Obj” in order to be able to call “ToString”
Initially, call site is “uninitialized”, method lookup is performed
On first call, remmember obj type (i.e. “ToString” address), change to “monomorphic” Always perform the call directly, as long as object type does not change
If object type chages, switch back to “uninitialized”
What about this case?
Solution: keep a limited number of different types (e.g. 2) switch state from “monomorphic” to “polymorphic”
If even more object types occur, change to “megamorphic” and disable inline caching
var values = [1, "a", 2, "b", 3, "c", 4, "d"]; for (var value in values) { document.write(value.toString()); }
17
Firefox’s response: TraceMonkey
Incremental improvement over a ‘bytecode interpreter’ But does not preclude dynamic code generation
Uses “trace trees” instead of “control flow graph” as the internal representation for the program structure Basically, it “watches for” commonly-repeated actions, and optimize the “hot paths”
Easy to perform function inlining
Easier to perform type inference (e.g. determine whether “a+b” is “string concatenation” or “number addition”)
Looping overhead grossly diminished
Firefox also added some “polymorphic property caching” But only for prototype objects?
Tracemonkey is based on Adobe’s Tamarin:Tracing
18
Traditional CFG
19
Trace tree
20
Trace trees (cont’d)
Constructed and extended lazily
Designed to represent loops (performace-critical parts)
Anchor (loop header) discovered by dynamic profiling, not static analysis
Can include instructions from called methods (“inlining”)
Can have side exits Restore VM/interpreter state
Resume interpretation
21
Trace trees – loop nest
22
Trace trees – compilation
Optimization – greatly simplified, trace is effectively in SSA form just by renaming With an exception – can you see it?
Register allocation Traverse all traces in reverse recording order
Guarantee that all uses are “seen” before the definition
Use a simple scheme to allocate registers
Type specialization Speculate on variable type based on historic info
Insert guards to go to “interpreted” mode if type changes
23
Results
24
How about tools?
Modern IDE expectations: autocomplete, jump-to-definition, browsing, refactoring
First hints: syntax var x = 12.5;
var y = { a:1, b:2};
function foo(a, b) { return a+b;}
Next hints: inference var x = 12.5; var y=x;
Can apply standard techniques used for optimization!
Common idioms/ coding style / jsdoc comments
Type inference?
25
Tool support for dynamic languages
What you can’t solve deterministically, solve probabilistically Make assumptions based on what you know (e.g., variable types don’t change)
Monte Carlo methods?
It’s not the end of the world if you’re wrong
How about refactoring/rename? Java/.NET can’t do it perfectly, either! (how about dynamic class loading/reflection?
Struts/XML configuration files? DB persistence layers?)
Conclusions: Still plenty of low-hanging fruit in the area of dynamic language research
Can apply optimization theory to IDEs
26
27
Credits
Presentation assembled mainly from
http://code.google.com/apis/v8/design.html
http://andreasgal.com/
Steve Yegge’s speech & Stanford EE Dept Computer Systems Coloqium, May 2008
http://steve-yegge.blogspot.com/2008/05/dynamic-languages-strike-back.html
Jeremy Condit presentation (CS 265 Expert Topic, 23 April 2003)
http://www.cs.berkeley.edu/~jcondit/cs265/expert.html
28
BACKUP SLIDES
29
The Java Virtual Machine
Each frame contains local variables and an operand stack
Instruction set Load/store between locals and operand stack
Arithmetic on operand stack
Object creation and method invocation
Array/field accesses
Control transfers and exceptions
The type of the operand stack at each program point is known at compile time
30
Java Virtual Machine (cont’d)
Example:
iconst 2iload aiload biaddimulistore c
Computes: c := 2 * (a + b)
31
Example:
iconst 2iload aiload biaddimulistore c
Computes: c := 2 * (a + b)
Java Virtual Machine (cont’d)
a
b
c
42
7
0
32
Example:
iconst 2iload aiload biaddimulistore c
Computes: c := 2 * (a + b)
Java Virtual Machine (cont’d)
a
b
c
42
7
0
2
33
Java Virtual Machine (cont’d)
a
b
c
42
7
0 42
2
Example:
iconst 2iload aiload biaddimulistore c
Computes: c := 2 * (a + b)
34
Java Virtual Machine (cont’d)
a
b
c
42
7
0
7
42
2
Example:
iconst 2iload aiload biaddimulistore c
Computes: c := 2 * (a + b)
35
Java Virtual Machine (cont’d)
a
b
c
42
7
0 49
2
Example:
iconst 2iload aiload biaddimulistore c
Computes: c := 2 * (a + b)
36
Java Virtual Machine (cont’d)
a
b
c
42
7
0
98
Example:
iconst 2iload aiload biaddimulistore c
Computes: c := 2 * (a + b)
37
Java Virtual Machine (cont’d)
a
b
c
42
7
98
Example:
iconst 2iload aiload biaddimulistore c
Computes: c := 2 * (a + b)
38
Lazy Code Selection
Introduced by Intel Vtune JIT compiler ([Adl-Tabatabai 98])
Idea: Use a mimic stack to simulate the execution of the operand stack
Instead of the actual values, the mimic stack holds the location of the values
39
Lazy Code Selection (cont’d)
Each operand on the stack is an element from a class hierarchy
Operand
Immediate Memory Register FP Stack
Field Array Static Stack Constant
40
Lazy Code Selection (cont’d)
Example:
iconst 2
iload a
iload b
iadd
imul
istore c
a
b
c
Reg eax
Stack 4
Stack 8
41
Lazy Code Selection (cont’d)
Example:
iconst 2
iload a
iload b
iadd
imul
istore c
a
b
c
Reg eax
Stack 4
Stack 8
Imm 2
42
Lazy Code Selection (cont’d)
Example:
iconst 2
iload a
iload b
iadd
imul
istore c
a
b
c
Reg eax
Stack 4
Stack 8 Reg eax
Imm 2
43
Lazy Code Selection (cont’d)
Example:
iconst 2
iload a
iload b
iadd
imul
istore c
a
b
c
Reg eax
Stack 4
Stack 8
Stack 4
Reg eax
Imm 2
44
Lazy Code Selection (cont’d)
Example:
iconst 2
iload a
iload b
iadd
imul
istore c
a
b
c
Reg eax
Stack 4
Stack 8 Reg ebx
Imm 2
movl ebx, eaxaddl ebx, 4(esp)
45
Lazy Code Selection (cont’d)
Example:
iconst 2
iload a
iload b
iadd
imul
istore c
a
b
c
Reg eax
Stack 4
Stack 8
Reg ebx
movl ebx, eaxaddl ebx, 4(esp)sall ebx, 1
46
Lazy Code Selection (cont’d)
Example:
iconst 2
iload a
iload b
iadd
imul
istore c
a
b
c
Reg eax
Stack 4
Stack 8
movl ebx, eaxaddl ebx, 4(esp)sall ebx, 1movl 8(esp), ebx
47
Lazy Code Selection (cont’d)
Achieves several results Converts stack-based architecture to register-based architecture
Folds computations into more complex x86 instructions
Allows additional optimizations Strength reduction and constant propagation
Redundant load-after-store
Disadvantages Extra operands spilled to stack after basic blocks
48
Exception Handling
Problem: Exceptions aren’t thrown often, but they complicate control flow
Solution: On-demand exception translation Maintain a mapping of native code addresses to original bytecode addresses
When an exception occurs, look up the original address and jump to the appropriate exception handler
Results in less compilation overhead and bigger basic blocks
49
Exception Handling (cont’d)
Problem: Exceptions are not always rare
Solution: Inlining Eliminate exceptions when possible
try {
…
throw new MyException();
…
} catch (MyException e) {
…
}
Use method inlining to create more opportunities
50
AIX JDK [Ishizaki 99]
JIT compiler for PowerPC (32-bit RISC)
Contributions: Null check elimination
Array bounds check elimination
Global common subexpression elimination
Type inclusion test optimization
Static method call inlining
Dynamic method call resolution
…none of which matter very much
Each optimization is fairly effective Over 50% of run-time null checks eliminated
But overall effects are relatively small At most 10% improvement in overall execution time