JVM Magic

Preview:

DESCRIPTION

Virtual machines don't have to be slow, they don't even have to be slower than running native code.All you have to do is write your code, lay back and let the JVM do its magic !Learn about various JVM runtime optimizations and why is it considered one of the best VMs in the world.

Citation preview

The JVM Magic

Baruch SadogurskyConsultant & Architect,

AlphaCSP

Agenda

• Introduction• GC Magic 101• General Optimizations• Compiler Optimizations• What can I do?

• Programming tips• JVM configuration flags

2

Introduction

Introduction

• In the past, JVM was considered by many as Java Achilles’ heel• Interpreter?!

• JVM team improved performance in 300 to 3000 times• JDK 1.6 compared to JDK 1.0

• Java is measured to be 50% to 100+% the speed of C and C++• Jake2 vs Quake2

• How can it be?

Java Virtual Machines Zoo

• CEE-J • Excelsior JET• Hewlett-Packard• J9 (IBM)• Jbed• Jblend• Jrockit• MRJ• MicroJvm• MS JVM• OJVM• PERC• Blackdown Java• CVM• Gemstone• Golden Code Development• Intent• Novell• NSIcom CrE-ME• ChaiVM• HotSpot• AegisVM• Apache Harmony• CACAO• Dalvik• IcedTea

• IKVM.NET• Jamiga• JamVM• Jaos• JC• Jelatine JVM• JESSICA• Jikes RVM• Jnode• JOP• Juice• Jupiter• JX• Kaffe• leJOS• Mika VM• Mysaifu• NanoVM• SableVM• Squawk virtual machine• SuperWaba• TinyVM• VMkit of Low Level Virtual Machine• Wonka VM• Xam

5

HotSpot Virtual Machine

• Developed by Longview Technologies back in 1999

• Contains:• Class loader• Bytecode interpreter• 2 Virtual machines• 7 Garbage collectors• 2 Compilers• Runtime libraries

HotSpot Virtual Machine

• Configured by hundreds of –XX flags

• Reminder• -X options are non-standard• -XX options have specific system

requirements for correct operations• Both are subject to change without

notice

GC Magic 101

GC Is Slow?

• GC has bad performance reputation• Reduces throughput• Introduces pauses• Unpredictable• Uncontrolled• Performance degradation is proportional to

objects count• Just give me the damn free() and malloc()!

I’ll be just fine!• Is it so?

Generational Collectors

• Weak generational hypothesis• Most objects die young (AKA Infant mortality)• Few old to young references

• Generations: regions holding objects of different ages• GC is done separately once a generation fills• Different GC algorithms• The young (nursery) generation

• Collected by “Minor garbage collection”• The old (tenured) generation

• Collected by “Minor garbage collection”

GC Magic 101

• Young is better than Tenured• Let your objects die in young

generation• When possible and makes sense

11

vs

GC Magic 101

12

vs

• Swapping is bad• Application's memory footprint should

not exceed the available physical memory

GC Magic 101

13

vs

• Choose:• Throughput (client)• Low-pause (server)

GC Magic 101

http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html

14

Garbage First (G1)

• New in JDK 1.6 u14 (May 29th)• All memory is divided to 1MB buckets• Calculates objects liveness in buckets

• Drops “dead” buckets• If a bucket is not total garbage, it’s not dropped

• Collects the most garbage buckets first

• Pauses only on “mark”• No sweep

• User can provide pause time goals• Actual seconds or Percentage of runtime• G1 records bucket collection time and can estimate how

many buckets to collect during pause

Garbage First (G1)

• Targets multi-process machines and large heaps

• G1 will be the long-term replacement for the CMS collector• Unlike CMS, compacts to battle fragmentation

• A bucket’s space is fully reclaimed• Better throughput• Predictable pauses (high probability)• Garbage left in buckets with high live ratio

• May be collected later

GC Is Slow? – The Answers

• Reduces throughput• You choose

• Introduces pauses• You choose

• Unpredictable• Not any more

• Uncontrolled• Configurable

• Performance degradation is proportional to objects count• Not true

• Just give me the damn free() and malloc()! I’ll be just fine!• Bad idea (see more later)

General Optimizations

HotSpot Optimizations

• JIT Compilation• Compiler Optimizations• Generates more performant code that

you could write in native• Adaptive Optimization• Split Time Verification• Class Data Sharing

Two Virtual Machines?

• Client VM• Reducing start-up time and memory footprint• -client CL flag

• Server VM• Maximum program execution speed• -server CL flag

• Auto-detection• Server: >1 CPUs & >=2GB of physical memory• Win32 – always detected as client• Many 64bit OSes don’t have client VMs

47

Just-In-Time Compilation

• Everyone knows about JIT!• Hot code is compiled to native• What is “hot”?

• Server VM – 10000 invocations• Client VM – 1500 invocations• Use -XX:CompileThreshold=# to change

• More invocations – better optimizations• Less invocations – shorter warmup time

Just-In-Time Compilation

• The code is being optimized by the compiler• Coming soon…

Adaptive Optimization

• Allows HotSpot to uncompile previously compiled code

• Much more aggressive, even speculative optimizations may be performed

• And rolled back if something goes wrong or new data gathered• E.g. classloading might invalidate inlining

Split Time Verification

• Java suffers from long boot time• One of the reasons is bytecode

verification• Valid flow control• Type safety• Visibility

• In order to ease on the weak KVM, J2ME started performing part of the verification in compile time

• It’s good, so now it’s in Java SE 6 too

Class Data Sharing

• Helps improve startup time• During JDK installation part of

rt.jar is preloaded into shared memory file which is attached in runtime

• No need to reload and reverify those classes every time

Compiler Optimizations

Two Types of Optimizations

• Java has two compilers:• javac bytecode compiler• HotSpot VM JIT compiler

• Both implement similar optimizations

• Bytecode compiler is limited• Dynamic linking• Can apply only static optimizations

Warning

• Caution! Don’t try this at home yourself!

• The source code you are about to see is not real!• It’s pseudo assembly code

• Don’t write such code!• Source code should be

readable and object-oriented• Bytecode will become

performant automagically

55

Optimization Rules

• Make the common case fast• Don't worry about uncommon/infrequent

case• Defer optimization decisions

• Until you have data• Revisit decisions if data warrants

56

Null check Elimination

• Java is null-safe language• Pointer can’t point to

meaningless portion of memory• Null checks are added by the

compiler, NullPointerException is thrown

• JVM’s profiler can eliminate those checks

57

Example – Original Source

58

1 public class Game { 2 3 private Logger logger; 4 private int totalScore; 5 6 public void score(String player, int points) { 7 logger.info(player + " scores " + points); 8 totalScore += points; 9 } 10 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 game.score("Bob", 12); 6 } 7 }

Example – Null Check Elimination

59

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 if(game == null) throw new NullPointerException("null"); 6 game.score("Bob", 12); 7 } 8 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 if(game == null) throw new NullPointerException("null"); 6 game.score("Bob", 12); 7 } 8 }

Inlining

• Love Encapsulation?• Getters and setters

• Love clean and simple code?• Small methods

• Use static code analysis?• Small methods

• No penalty for using those!• JIT brings the implementation of these

methods into a containing method• This optimization known as “Inlining”

Inlining

• Not just about eliminating call overhead• Provides optimizer with bigger blocks• Enables other optimizations

• hoisting, dead code elimination, code motion, strength reduction

61

Inlining

• But wait, all public non-final methods in Java are virtual!• HotSpot examines the exact case in place• In most cases there is only one implementation,

which can be inlined• But wait, more implementations may be

loaded later!• In such case HotSpot undoes the inlining• Speculative inlining

• By default limited to 35 bytes of bytecode• Use -XX:MaxInlineSize=# to change

Example - Inlining

63

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 game.score("Bob", 12); 6 game.logger.info("Bob" + " scores " + 12); 7 game.totalScore += 12; 8 } 9 }

Example – Source Code Revision

64

1 public class GameMove { 2 3 private final String player; 4 private final int points; 5 6 public GameMove(String player, int points) { 7 this.player = player; 8 this.points = points; 9 } 10 11 public String getPlayer() { 12 return player; 13 } 14 15 public int getPoints() { 16 return points; 17 } 18 }

Example – Source Code Revision

65

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 for (GameMove gameMove : gameMoves) { 7 game.score(gameMove.getPlayer(), gameMove.getPoints()); 8 } 9 } 10 }

Code Hoisting

• Hoist = to raise or lift• Size optimization• Eliminate duplicate code in

method bodies by hoisting expressions or statements• Duplicate bytecode, not necessarily

source code

Example – Code Hoisting

67

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 for (int i = 0; i < gameMoves.length; i++) { 7 if (i < 0 || gameMoves.length <= i) throw new ArrayIndexOutOfBoundsException(); 8 game.score(gameMoves[i].getPlayer(), gameMoves[i].getPoints()); 9 } 10 } 11 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 int length = gameMoves.length; 7 for (int i = 0; i < length; i++) { 8 if (i < 0 || length <= i) throw new ArrayIndexOutOfBoundsException(); 9 game.score(gameMoves[i].getPlayer(), gameMoves[i].getPoints()); 10 } 11 } 12 }

Bounds Check Elimination

• Java promises automatic boundary checks for arrays• Exception is thrown

• If programmer checks the boundaries of its array by himself, the automatic check can be turned off

Example – Bounds Check Elimination

69

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 int length = gameMoves.length; 7 for (int i = 0; i < length; i++) { 8 if (i < 0 || length <= i) throw new ArrayIndexOutOfBoundsException(); 9 game.score(gameMoves[i].getPlayer(), gameMoves[i].getPoints()); 10 } 11 } 12 }

Sub-Expression Elimination

• Avoids redundant memory access

70

1 @Test 2 public void testScore() { 3 Game game = new Game(); 4 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 5 for (int i = 0; i < gameMoves.length; i++) { 6 game.score(gameMoves[i].getPlayer(), gameMoves[i].getPoints()); 7 } 8 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 for (int i = 0; i < gameMoves.length; i++) { 7 GameMove move = gameMoves[i]; 8 game.score(move.getPlayer(), move.getPoints()); 9 } 10 } 11 }

Loop Unrolling

• Some loops shouldn’t be loops• In performance meaning, not code

readability• Those can be unrolled to set of

statements• If the boundaries are dynamic,

partial unroll will occur

Example – Loop Unrolling

72

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 for (int i = 0; i < gameMoves.length; i++) { 7 GameMove move = gameMoves[i]; 8 game.score(move.getPlayer(), move.getPoints()); 9 } 10 } 11 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 GameMove move = gameMoves[0]; 7 game.score(move.getPlayer(), move.getPoints()); 8 move = gameMoves[1]; 9 game.score(move.getPlayer(), move.getPoints()); 10 } 11 }

Example – Inlining

73

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 GameMove move = gameMoves[0]; 7 game.score(move.getPlayer(), move.getPoints()); 8 game.score(move.player, move.points); 9 move = gameMoves[1]; 10 game.score(move.getPlayer(), move.getPoints()); 11 game.score(move.player, move.points); 12 } 13 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 GameMove move = gameMoves[0]; 7 game.score(move.player, move.points); 8 game.logger.info(move.player + " scores " + move.points); 9 game.totalScore += move.points; 10 move = gameMoves[1]; 11 game.score(move.player, move.points); 12 game.logger.info(move.player + " scores " + move.points); 13 game.totalScore += move.points; 14 } 15 }

Escape Analysis

• Escape analysis is not optimization

• It is check for object not escaping local scope• E.g. created in private method, assigned

to local variable and not returned• Escape analysis opens up

possibilities for lots of optimizations

Scalar Replacement

• Remember the rule “new == always new object”?• False!

• JVM can optimize away allocations• Fields are hoisted into registers

• Object becomes unneeded• But object creation is cheap!

• Yap, but GC is not so cheap…

75

Example – Source Code Revision

76

1 public class Moves { 2 3 GameMove nextMove(){ 4 return new GameMove(nextPlayer(), calcPoints()); 5 } 6 7 private int calcPoints() {…} 8 9 private String nextPlayer() {…} 10 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 Moves moves = new Moves(); 6 GameMove move = moves.nextMove(); 7 game.score(move.getPlayer(), move.getPoints()); 8 } 9 }

Example – Scalar Replacement

77

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 Moves moves = new Moves(); 6 GameMove move = moves.nextMove(); 7 GameMove move = new GameMove(moves.nextPlayer(), moves.calcPoints()); 8 game.score(move.getPlayer(), move.getPoints()); 9 } 10 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 Moves moves = new Moves(); 6 GameMove move = new GameMove(moves.nextPlayer(), moves.calcPoints()); 7 String player = moves.nextPlayer(); 8 int points = moves.calcPoints(); 9 game.score(move.getPlayer(), move.getPoints()); 10 game.score(player, points); 11 } 12 }

Example – Scalar Replacement

78

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 Moves moves = new Moves(); 6 String player = moves.nextPlayer(); 7 int points = moves.calcPoints(); 8 game.score(player, points); 9 game.score(moves.nextPlayer(), moves.calcPoints()); 10 } 11 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 Moves moves = new Moves(); 6 game.score(moves.nextPlayer(), moves.calcPoints()); 7 int points = moves.calcPoints(); 8 game.logger.info(moves.nextPlayer() + " scores " + points); 9 game.totalScore += points; 10 } 11 }

Lock Coarsening

• HotSpot merges adjacent synchronized blocks using the same lock

• The compiler is allowed to moved statements into merged coarse blocks

• Tradeoff performance and responsiveness• Reduces instruction count• But locks are held longer

1 private void addPresentor 2 (StringBuffer lecture, String presentor) { 3 lecture 4 .append( " by ") 5 .append(presentor); 6 }

Example – Source Code Revision

80

1 public class Game { 2 3 Logger logger; 4 int totalScore; 5 6 public void score(String player, int points) { 7 logger.info(player + " scores " + points); 8 totalScore += points; 9 } 10 11 public synchronized void multithreadedScore(String player, int points){ 12 score(player, points); 13 } 14 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 game.multithreadedScore("Bob", 5); 6 game.multithreadedScore("Jane", 7); 7 game.multithreadedScore("Dwane", 1); 8 } 9 }

Example – Lock Coarsening

81

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 lock(game); 6 game.multithreadedScore("Bob", 5); 7 unlock(game); 8 lock(game); 9 game.multithreadedScore("Jane", 7); 10 unlock(game); 11 lock(game); 12 game.multithreadedScore("Dwane", 1); 13 unlock(game); 14 } 15 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 lock(game); 6 game.multithreadedScore("Bob", 5); 7 game.logger.info("Bob" + " scores " + 5); 8 game.totalScore += 5; 9 game.multithreadedScore("Jane", 7); 10 game.logger.info("Jane" + " scores " + 7); 11 game.totalScore += 7; 12 game.multithreadedScore("Dwane", 1); 13 game.logger.info("Dwane" + " scores " + 1); 14 game.totalScore += 1; 15 unlock(game); 16 } 17 }

Lock Elision

• A thread enters a lock that no other thread will synchronize on• Synchronization has no effect• Can be deducted using escape analysis

• Such locks can be elided• Elides 4 StringBuffer

synchronized calls: 1 private String getLecture(String topic, String presentor) { 2 return new StringBuffer() 3 .append(topic) 4 .append( " by ") 5 .append(presentor) 6 .toString(); 7 }

Example - Lock Elision

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 lock(game); 6 game.logger.info("Bob" + " scores " + 5); 7 game.totalScore += 5; 8 game.logger.info("Jane" + " scores " + 7); 9 game.totalScore += 7; 10 game.logger.info("Dwane" + " scores " + 1); 11 game.totalScore += 1; 12 unlock(game); 13 } 14 }

Constants Folding

• Trivial optimization• How many constants are there?

• More than you think!• Inlining generates constants• Unrolling generates constants• Escape analysis generates constants

• JIT determines what is constant in runtime• Whatever doesn’t change

Constants Folding

• Literals folding• Before: int foo = 9*10;• After: int foo = 90;

• String folding or StringBuilder-ing• Before: String foo = "hi Joe " + (9*10);

• After: String foo = new StringBuilder().append("hi Joe ").append(9 * 10).toString();

• After: String foo = "hi Joe 90";

Example – Constants Folding

86

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 game.logger.info("Bob" + " scores " + 5); 6 game.logger.info("Bob scores 5"); 7 game.totalScore += 5; 8 game.logger.info("Jane" + " scores " + 7); 9 game.logger.info("Jane scores 7"); 10 game.totalScore += 7; 11 game.logger.info("Dwane" + " scores " + 1); 12 game.logger.info("Dwane scores 1"); 13 game.totalScore += 1; 14 } 15 }

Dead Code Elimination

• Dead code - code that has no effect on the outcome of the program execution public static void main(String[] args) {

long start = System.nanoTime();

int result = 0;

for (int i = 0; i < 10 * 1000 * 1000; i++) {

result += Math.sqrt(i);

}

long duration = (System.nanoTime() - start) / 1000000;

System.out.format("Test duration: %d (ms) %n", duration);

}

OSR - On Stack Replacement

• Normally code is switched from interpretation to native in heap context• Before entering method

• OSR - switch from interpretation to compiled code in local context• In the middle of a method call• JVM tracks code block execution count

• Less optimizations• May prevent bound check elimination and

loop unrolling

0Read a to

stack

1Increment

Store a to heap

1

Read a to stack

1

Add 8 9

Store a to heap

9

0

0

1

1

1

9

HeapStack

Out-Of-Order Execution

1 public class OutOfOrder { 2 3 private int a; 4 5 public void foo(){ 6 a++; 7 a+=8; 8 } 9 10 }

1 public class OutOfOrder { 2 3 private int a; 4 5 public void foo(){ 6 a++; 7 a+=8; 8 } 9 10 }

Out-Of-Order Execution

0Read a to

stack

1Increment

Add 8 9

Store a to heap

9

0

0

0

9

HeapStack

Programming & Tuning Tips

• 91

How Can I Help?

• Just write good quality Java code• Object Orientation• Polymorphism• Abstraction• Encapsulation• DRY• KISS

• Let the HotSpot optimize

93

How Can I Help?

• final keyword• For fields:

• Allows caching• Allows lock coarsening

• For methods:• Simplifies Inlining decisions

• Immutable objects die younger

JVM tuning tips

• Reminder: -XX options are non standard• Added for HotSpot development purposes• Mostly tested on Solaris 10• Platform dependent

• Some options may contradict each other

• Know and experiment with these options

94

Monitoring & Troubleshooting

Option Comment

HeapDumpOnOutOfMemoryError Since 5.0u7, HPROF

HeapDumpOnCtrlBreak Since 5.0u14, HPROF

OnError Runs list of user defined command on fatal error

OnOutOfMemoryError Runs list of user defined command on 1st out of memory error

PrintClassHistogram Print a histogram of class instances on Ctrl-Break (jmap –histo)

PrintConcurrentLocks Print java.util.concurrent locks on Ctrl-Break (jstack –l)

PrintCompilation Traces compiled methods

PrintInlining Traces inlining

PrintOptoAssembly Traces the generated Assemlby

TraceClassLoading/Unloading Traces class loading/unloading

95

References

• JavaOne 2009 Sessions:• Garbage Collection Tuning in the Java

HotSpot™ Virtual Machine• Under the Hood: Inside a High-

Performance JVM™ Machine• Practical Lessons in Memory Analysis• Debugging Your Production JVM™

Machine• Inside Out: A Modern Virtual Machine

Revealed

97

Thank you for your attention

Thanks to Ori Dar!

Recommended