22
Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08 Nikola Grcevski, IBM Canada Lab

Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

Embed Size (px)

Citation preview

Page 1: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and

Just-In-Time (JIT) compiler

Compiler workshop ’08Nikola Grcevski, IBM Canada Lab

Page 2: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

Agenda

• The motivation and the importance of profiling

• Design and implementation of J9 VM interpreter profiler

• Performance results and start-up overhead

Page 3: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

The static vs. dynamic compiler

• Static compilers can take their time to analyze the code - perform intra procedural analysis

• Dynamic Just-In-Time compilers don’t have this luxury, compilation happens during application runtime

• Can dynamic compilers ever produce quality optimized code comparable to static compilers?

Page 4: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

Why profile?

• The whole category of speculative optimizations relies on some type of profiling information

• Opens up opportunities for new code and memory optimizations

• Critical for high performance dynamic compiler systems

Page 5: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

What could we profile?

• Pretty much anything that we expect will provide repeatable information that we can use to optimize

• The profiling can be at the Java level or CPU level if the OS supports it.

Page 6: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

What kind of profilers does J9 have

• JIT profiler– Instruments methods with various profiling hooks – Targeted only to methods that are very hot– Temporal and slows down execution

• Interpreter profiler– The topic of this presentation

Page 7: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

What kinds of data we collect withthe interpreter profiler?

• Branch direction• Virtual/Interface call targets• Switch statement index• Instanceof and checkcast runtime types

Page 8: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

Interpreter profiler design

• Buffered approach to data collection on the application threads

…….

Application Thread 1 Application Thread N

if

vcall

if

vcall

icall

if

if

switch

mul

add

div

Page 9: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

Interpreter profiler design

• Buffer full event triggers processing of the data by the JIT

…….

Application Thread 1

if

vcall

if

switch

if

Buffer full event

JIT runtime

Page 10: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

Interpreter profiler design

• JIT parses the application thread profiling buffer and builds internal profiling data structure

JIT runtime

JIT profiling hashtable

data

Bytecode program counter

Profiling buffer

Hash function based on

bytecode PC

Page 11: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

What’s in the data we collect?

• Bytecode program counter• Variable size data packet

– 1 byte for branch direction– Word size for call targets and runtime types– 4 bytes for switch index

Page 12: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

Processing the buffered branch information

• We create an object to hold the bytecode PC and branch counts. We are using 4 bytes to store the branch information.

pc;

taken | not taken

Page 13: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

What does the JIT do with the call information?

• We keep up to 3 call targets with their counts as well as residue count

pc;

Class A;

Class B;

Class C;

count

count

count

We use the same approach for checkcast and instanceof

residue

Page 14: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

What does the JIT do with the switch information?

• We create a data structure to hold the bytecode PC and counts for switch index. The index data is 8 bytes wide, split into 4 records: the top 3 and the rest.

pc;

record 1 record 2 record 3 The rest

count | index

each record is split into 2 portions: 1 byte count and 1 byte switch index

Page 15: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

Storing the profiling data

• Each data record is stored in global hashtable, using the PC for the hash function

• On subsequent encounters of the same PC with profiling data the records are updated.– Branch and switch counts are incremented– Call targets and runtime types are added and counts

incremented.

Page 16: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

Using the profiling information

• The profiler database only knows of bytecode PC

• At all points where the compiler is interested in profiling information it generates the bytecode pc from the method information and the bytecode index

• The compiler has to make sense out of the information in the hashtable

Page 17: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

Interpreter profiler design

• JIT compiler consults the profiling hashtable in various stages of method compilation

JIT profiling hashtable

…….

Compilation Thread

inliner

order code

codegen

Page 18: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

Performance results

• Up to 30% improvement on various applications– EJB and other middleware applications benefit mostly from

code ordering and devirtualization for the purpose of inlining

– Benchmarks typically benefit from other optimization enabled by the ability to devirtualize virtual and interface calls

• With various tweaks we managed to drive the start-up over head to below 10%

Page 19: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

How do we manage the profiling overhead?

• We turn the profiler off in –Xquickstart mode

• No locking on the hashtable

• We detect startup phase of the application and skip records to ease off the data collection overhead

Page 20: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

Turning the profiler ON and OFF

• The profiler is ON by default

• The sampler thread turns the profiler OFF or back ON– Number of consecutive ticks in JIT generated code turns

the profiler OFF– Number of consecutive ticks in interpreter turns the profiler

back ON

Page 21: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

Some of the problems we encountered

• Tuning for optimal balance between startup overhead and throughput performance wasn’t easy

• Application phase change detection wasn’t easy

• Class unloading created lots of problems

Page 22: Buffered dynamic run-time profiling of arbitrary data for Virtual Machines which employ interpreter and Just-In-Time (JIT) compiler Compiler workshop ’08

Summary

• Profiling is critical for performance of run-time systems

• Using buffered approach to data collection can help build efficient profilers

• Tuning for optimal balance of startup overhead and throughput performance is challenging