View
7
Download
0
Category
Preview:
Citation preview
IBM Research - Tokyo
October 23, 2012 | SPLASH/OOPSLA 2012 at Tucson, Arizona © 2012 IBM Corporation
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler
Hiroshi Inoue†, Hiroshige Hayashizaki†, Peng Wu‡ and Toshio Nakatani†† IBM Research – Tokyo‡ IBM Research – T.J. Watson Research Center
2
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
Goal
Balancing steady-state performance and startup performancehas been a challenging task for JIT compilers
to improve the steady-state performance without hurting startup performance in a trace-based Java JIT compiler
Our Goal
Our contributions:1. We show that adaptive multi-level compilation is a practical way to
balance the startup time and steady-state performance in a trace-JIT2. We develop a new technique to efficiently identify hot paths to
recompile in a higher optimization level
3
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
Background: Trace-based Compilation
Using a Trace, a hot path identified at runtime, as a basic unit of compilation
method fmethod entry
return
if (x != 0)
rarelyexecuted
while (!end)
do something
frequentlyexecuted
4
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
Background: Trace-based Compilation
Using a Trace, a hot path identified at runtime, as a basic unit of compilation
method fmethod entry
a hot pathreturn
if (x != 0)
frequentlyexecuted
while (!end)
trace exit
trace exit(side exit)
trace entry trace A
include only a hot path
if (x != 0)
rarelyexecuted
while (!end)
do something
frequentlyexecuted
Trace selection: how to form good compilation scope
AA
BB
exit
exit
linear trace
AA
BB
exit
cyclic trace
stub stub
5
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
Background: Trace Selection and Performance
Steps of trace selection1. Identify a hot trace head
a target of a backward branch, a exit point of an existing trace2. Record next execution path starting from the trace head3. Stop recording when the trace being recorded
forms a cycle, reaches pre-defined maximum length
Generating longer traces (compilation scopes):☺ yields better steady-state performance
• more optimization opportunities for compilers• smaller trace transitioning overhead
but it hurts startup performance• longer compilation time• more duplicated code among traces
6
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
Our Approach: Adaptive multi-level compilation for trace-JIT
1. Interpreted execution
2. Initial compilation for faster startup– smaller compilation scope– lower optimization level
3. Upgrade recompilation for higher steady-state performance– larger compilation scope– higher optimization level
identify hot paths by counting the execution count for each potential trace head (e.g. loop backedge)
identify really hot paths by using timer-based sampling profiler
Our focusTrace selection for
upgrade recompilation
7
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
Steps of Our Multi-level Compilation
BB1BB1
BB2BB2
BB3BB3
BB4BB4
BB5BB5
BB6BB6
source program
cyclic tracecyclic trace
linear trace 1linear
trace 1
linear trace 2linear
trace 2
BB7BB7
initialcompilation
hot path
cold path
trace head
upgraderecompilation
recompiledcyclic trace
recompiledcyclic trace
recompiledlinear trace
recompiledlinear trace
executed by interpreter with monitoring to
identify hot pathexecuted as
compiled codewith timer-
based sampling profiler
Our focusTrace selection
for upgrade recompilation
We limit the max trace length for
faster startup(here 2 BBs)
8
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
recompiledlinear trace
recompiledlinear trace
An Example of Inefficient Selection for Recompilation
linear trace 1linear
trace 1
linear trace 2linear
trace 2
linear trace 3linear
trace 3
linear trace 1linear
trace 1
recompiledlinear trace
recompiledlinear trace
upgraderecompilation
Probleminsufficient
coverage by recompiled traces
Badly formed case Desired case
We want to cover the entire hot pathby a recompiled
trace
a st
raig
ht-li
ne h
ot p
ath
9
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
Our Solution: Trace Transition Graph (TTgraph)
Trace-Transition Graph (TTgraph) is a weighted directed graph representing the control flow among traces
Node: compiled traceEdge: transition between two traces– weight represents relative frequency
We identify the hot paths to recompile using the TTgraph– to capture the entire hot path by
traversing the edges backwardlinear trace
3’
linear trace
3’
linear trace
1
linear trace
1
linear trace
2
linear trace
2
linear trace
1’
linear trace
1’
4
6
81
2
linear trace
3
linear trace
3
92
10
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
Steps to Build TTgraph
1. We add an node when a trace is compiled
2. We add an edge between two traces when we link them (trace linking)
3. We increment weight of an edge by timer-based sampling profiler
linear trace
3’
linear trace
3’
linear trace
1
linear trace
1
linear trace
2
linear trace
2
linear trace
1’
linear trace
1’
linear trace
3
linear trace
3
cyclic tracecyclic trace
1
1
11
1
11See the paper for more detail
(e.g. how we profile transition andhow we employ bursty tracing)
See the paper for more detail (e.g. how we profile transition and
how we employ bursty tracing)
11
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
12
12
Steps to Build TTgraph
1. We add an node when a trace is compiled
2. We add an edge between two traces when we link them (trace linking)
3. We increment weight of an edge by timer-based sampling profiler
linear trace
3’
linear trace
3’
linear trace
1
linear trace
1
linear trace
2
linear trace
2
linear trace
1’
linear trace
1’
linear trace
3
linear trace
3
cyclic tracecyclic trace
1
1
1
11See the paper for more detail
(e.g. how we profile transition andhow we employ bursty tracing)
See the paper for more detail (e.g. how we profile transition and
how we employ bursty tracing)
12
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
Steps to Build TTgraph
1. We add an node when a trace is compiled
2. We add an edge between two traces when we link them (trace linking)
3. We increment weight of an edge by timer-based sampling profiler
linear trace
3’
linear trace
3’
linear trace
1
linear trace
1
linear trace
2
linear trace
2
linear trace
1’
linear trace
1’
linear trace
3
linear trace
3 See the paper for more detail (e.g. how we profile transition and
how we employ bursty tracing)
See the paper for more detail (e.g. how we profile transition and
how we employ bursty tracing)
cyclic tracecyclic trace
4
6
81
1
92
13
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
Recompilation Based on TTgraph
We traverse the hot edges backward to identify the new trace head for recompilation– we stop the backward traversal
when hitting a cyclic trace (our system does not capture cyclic and linear path in one trace)
– If there are multiple hot incoming edges for a node, we track both of themlinear
trace 3’
linear trace
3’
linear trace
1
linear trace
1
linear trace
2
linear trace
2
linear trace
1’
linear trace
1’
linear trace
3
linear trace
3
See the paper for more detail See the paper for more detail
cyclic tracecyclic trace
4
6
81
1
92
recompile
14
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
Performance Evaluation
Hardware: IBM BladeCenter JS22– 4 cores (8 SMT threads) of POWER6 4.0GHz – 16 GB system memory
Our trace-JIT:– Extended IBM JVM for Java 6 (32-bit) to support trace
compilation– Two optimization levels (both trace selection and code
generation), cold and hot– We compare cold only, hot only, and our adaptive multi-
level compilationBenchmark:
– DaCapo benchmark suite 9.12
15
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
avror
a
batik
eclip
se fop h2
jytho
nlui
ndex
lusea
rch pmd
sunfl
ow
tomca
ttra
debe
ans
xalan
geom
ean
rela
tive
stea
dy-s
tate
per
form
ance
.
cold onlyhot onlyOur adaptive multi-level compilation
Steady-state Performance
high
er is
fast
er
always using hot level gives26% gain in steady state
Our technique achieved 22% gain
in steady state
16
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
avror
a
batik
eclip
se fop h2
jytho
nlui
ndex
lusea
rch pmd
sunfl
owtom
cat
trade
bean
s
xalan
geom
ean
rela
tive
star
tup
time
.
cold onlyhot onlyOur adaptive multi-level compilation
Startup Time (Execution time of first iteration)
shor
ter i
s fa
ster
using hot level makes the startup 50% (up to 3x) slower!
Our technique does not hurt the startup
performance!
17
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
98.0%
100.0%
102.0%
104.0%
106.0%
108.0%
110.0%
avror
a
batik
eclip
se fop h2
jytho
nlui
ndex
lusea
rch pmd
sunfl
owtom
cat
trade
bean
s
xalan
geom
ean
rela
tive
peak
per
form
ance
basic recompilationTTgraph-based recompilation
Improvement by TTgraph-Based Recompilation
high
er is
fast
er
18
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
Summary
We showed that adaptive multi-level compilation is a practical way to balance the startup time and steady-state performance in a trace-JIT
We developed an efficient technique based on TTgraph to identify hot paths. We described the trace selection engine, the timer-based sampling profiler, and the code generator work together for effective recompilation.
19
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
Thank you!
20
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
Backup
21
IBM Research - Tokyo
Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler © 2012 IBM Corporation
CPU Time Breakdown
shor
ter i
s fa
ster
0.00.10.20.30.40.50.60.70.80.91.0
avror
a
batik
eclip
se fop h2
jytho
nluind
exluse
arch pmd
sunfl
owtomca
ttra
debe
ans
xalan
GEOMEAN
0.0
1.0
avror
a
batik
eclip
se fop h2
jytho
nluind
exluse
arch pmd
sunfl
owtomca
ttra
debe
ans
xalan
GEOMEAN
0.0
1.0
avror
a
batik
eclip
se fop h2
jytho
nluind
exluse
arch pmd
sunfl
owtomca
ttra
debe
ans
xalan
GEOMEAN
0.0
1.0
avror
a
batik
eclip
se fop h2
jytho
nluind
exluse
arch pmd
sunfl
owtomca
ttra
debe
ans
xalan
GEOMEAN
norm
aliz
ed C
PU
tim
e
single-level compilation (cold level)single-level compilation (hot level)
adaptive multi-level compilation (TTgraph-based recompilation)
OSnative
librariesJIT-compiled code
hot cold
Most of the execution time is spent in
recompiled code
Recommended