Adaptive Multi-Level Compilation in a Trace-based Java JIT ... · recom piled cyclic trace recom...

IBM Research - Tokyo

Adaptive Multi-Level Compilation in a Trace-based Java JIT Compiler

Hiroshi Inoue†, Hiroshige Hayashizaki†, Peng Wu‡ and Toshio Nakatani†† IBM Research – Tokyo‡ IBM Research – T.J. Watson Research Center

Balancing steady-state performance and startup performancehas been a challenging task for JIT compilers

to improve the steady-state performance without hurting startup performance in a trace-based Java JIT compiler

Our Goal

Our contributions:1. We show that adaptive multi-level compilation is a practical way to

balance the startup time and steady-state performance in a trace-JIT2. We develop a new technique to efficiently identify hot paths to

recompile in a higher optimization level

Background: Trace-based Compilation

Using a Trace, a hot path identified at runtime, as a basic unit of compilation

method fmethod entry

return

if (x != 0)

rarelyexecuted

while (!end)

do something

frequentlyexecuted

Background: Trace-based Compilation

Using a Trace, a hot path identified at runtime, as a basic unit of compilation

method fmethod entry

a hot pathreturn

if (x != 0)

frequentlyexecuted

while (!end)

trace exit

trace exit(side exit)

trace entry trace A

include only a hot path

if (x != 0)

rarelyexecuted

while (!end)

do something

frequentlyexecuted

Trace selection: how to form good compilation scope

linear trace

cyclic trace

stub stub

Background: Trace Selection and Performance

Steps of trace selection1. Identify a hot trace head

a target of a backward branch, a exit point of an existing trace2. Record next execution path starting from the trace head3. Stop recording when the trace being recorded

forms a cycle, reaches pre-defined maximum length

Generating longer traces (compilation scopes):☺ yields better steady-state performance

• more optimization opportunities for compilers• smaller trace transitioning overhead

but it hurts startup performance• longer compilation time• more duplicated code among traces

Our Approach: Adaptive multi-level compilation for trace-JIT

1. Interpreted execution

2. Initial compilation for faster startup– smaller compilation scope– lower optimization level

3. Upgrade recompilation for higher steady-state performance– larger compilation scope– higher optimization level

identify hot paths by counting the execution count for each potential trace head (e.g. loop backedge)

identify really hot paths by using timer-based sampling profiler

Our focusTrace selection for

upgrade recompilation

Steps of Our Multi-level Compilation

BB1BB1

BB2BB2

BB3BB3

BB4BB4

BB5BB5

BB6BB6

source program

cyclic tracecyclic trace

linear trace 1linear

trace 1

trace 2

BB7BB7

initialcompilation

hot path

cold path

trace head

upgraderecompilation

recompiledcyclic trace

recompiledlinear trace

executed by interpreter with monitoring to

identify hot pathexecuted as

compiled codewith timer-

based sampling profiler

Our focusTrace selection

for upgrade recompilation

We limit the max trace length for

faster startup(here 2 BBs)

An Example of Inefficient Selection for Recompilation

trace 1

trace 2

trace 3

trace 1

upgraderecompilation

Probleminsufficient

coverage by recompiled traces

Badly formed case Desired case

We want to cover the entire hot pathby a recompiled

Our Solution: Trace Transition Graph (TTgraph)

Trace-Transition Graph (TTgraph) is a weighted directed graph representing the control flow among traces

Node: compiled traceEdge: transition between two traces– weight represents relative frequency

We identify the hot paths to recompile using the TTgraph– to capture the entire hot path by

traversing the edges backwardlinear trace

linear trace

Steps to Build TTgraph

1. We add an node when a trace is compiled

2. We add an edge between two traces when we link them (trace linking)

3. We increment weight of an edge by timer-based sampling profiler

linear trace

11See the paper for more detail

(e.g. how we profile transition andhow we employ bursty tracing)

See the paper for more detail (e.g. how we profile transition and

how we employ bursty tracing)

linear trace

11See the paper for more detail

(e.g. how we profile transition andhow we employ bursty tracing)

linear trace

3 See the paper for more detail (e.g. how we profile transition and

Recompilation Based on TTgraph

We traverse the hot edges backward to identify the new trace head for recompilation– we stop the backward traversal

when hitting a cyclic trace (our system does not capture cyclic and linear path in one trace)

– If there are multiple hot incoming edges for a node, we track both of themlinear

trace 3’

linear trace

See the paper for more detail See the paper for more detail

recompile

Performance Evaluation

Hardware: IBM BladeCenter JS22– 4 cores (8 SMT threads) of POWER6 4.0GHz – 16 GB system memory

Our trace-JIT:– Extended IBM JVM for Java 6 (32-bit) to support trace

compilation– Two optimization levels (both trace selection and code

generation), cold and hot– We compare cold only, hot only, and our adaptive multi-

level compilationBenchmark:

– DaCapo benchmark suite 9.12

se fop h2

rch pmd

cold onlyhot onlyOur adaptive multi-level compilation

Steady-state Performance

always using hot level gives26% gain in steady state

Our technique achieved 22% gain

in steady state

se fop h2

rch pmd

cold onlyhot onlyOur adaptive multi-level compilation

Startup Time (Execution time of first iteration)

using hot level makes the startup 50% (up to 3x) slower!

Our technique does not hurt the startup

performance!

100.0%

102.0%

104.0%

106.0%

108.0%

110.0%

se fop h2

rch pmd

basic recompilationTTgraph-based recompilation

Improvement by TTgraph-Based Recompilation

Summary

We showed that adaptive multi-level compilation is a practical way to balance the startup time and steady-state performance in a trace-JIT

We developed an efficient technique based on TTgraph to identify hot paths. We described the trace selection engine, the timer-based sampling profiler, and the code generator work together for effective recompilation.

Thank you!

Backup

CPU Time Breakdown

0.00.10.20.30.40.50.60.70.80.91.0

se fop h2

nluind

exluse

arch pmd

owtomca

GEOMEAN

se fop h2

nluind

exluse

arch pmd

owtomca

GEOMEAN

se fop h2

nluind

exluse

arch pmd

owtomca

GEOMEAN

se fop h2

nluind

exluse

arch pmd

owtomca

GEOMEAN

single-level compilation (cold level)single-level compilation (hot level)

adaptive multi-level compilation (TTgraph-based recompilation)

OSnative

librariesJIT-compiled code

hot cold

Most of the execution time is spent in

recompiled code

Adaptive Multi-Level Compilation in a Trace-based Java JIT ... · recom piled cyclic trace recom...

Documents

Recom 紹介スライド

Slide Dna Recom

ANESTHESIA RECOM DOCS

REVIEW & RECOM MENDATIONS

TRAI recom 30jan08

Lec4 Piled Raft

Initiative for RECOM Initiative for RECOM · 2014-05-14 · regional expert group to consider the RECOM Statute as proposed by the Coalition for RECOM and agree on the text that would

RECOM Odontologiab Web

Piled Foundation 2

Abebe Degree- Trans-recom

Piled Embankments Decide & Design

Piled Foundations - Fellenius Foundation... · 2014-02-02 · Piled Foundations Fellenius, B.H., 1991. Piled Foundations. Chapter 13 in Foundation Engineering Handbook, 2nd Edition,

3. Piled Foundation Report

7/2012 - CDTPcdtp.org/.../2014/05/RECOM-Initiative-Voice-7-2012-ENG.pdf · 2014. 5. 14. · Initiative for RECOM Initiative for RECOM 7 RECOM Initiative Advocacy - Progress Report

Seminar Piled Raft Foundation

Initiative for RECOM Initiative for RECOMcdtp.org/wp-content/uploads/2014/05/RECOM... · Initiative for RECOM Initiative for RECOM 2 nor just on the 155 renowned artists and intellectuals

Evaluation of the existing piled foundation based on piled ... · piled–raft design philosophy to evaluate the adequacy of existing piled foundations (piles and raft) for two identical

piled raft

Piled Raft Foundations

Piled Raft Foundation