71
The JVM Magic Baruch Sadogursky Consultant & Architect, AlphaCSP

JVM Magic

Embed Size (px)

DESCRIPTION

Virtual machines don't have to be slow, they don't even have to be slower than running native code.All you have to do is write your code, lay back and let the JVM do its magic !Learn about various JVM runtime optimizations and why is it considered one of the best VMs in the world.

Citation preview

Page 1: JVM Magic

The JVM Magic

Baruch SadogurskyConsultant & Architect,

AlphaCSP

Page 2: JVM Magic

Agenda

• Introduction• GC Magic 101• General Optimizations• Compiler Optimizations• What can I do?

• Programming tips• JVM configuration flags

2

Page 3: JVM Magic

Introduction

Page 4: JVM Magic

Introduction

• In the past, JVM was considered by many as Java Achilles’ heel• Interpreter?!

• JVM team improved performance in 300 to 3000 times• JDK 1.6 compared to JDK 1.0

• Java is measured to be 50% to 100+% the speed of C and C++• Jake2 vs Quake2

• How can it be?

Page 5: JVM Magic

Java Virtual Machines Zoo

• CEE-J • Excelsior JET• Hewlett-Packard• J9 (IBM)• Jbed• Jblend• Jrockit• MRJ• MicroJvm• MS JVM• OJVM• PERC• Blackdown Java• CVM• Gemstone• Golden Code Development• Intent• Novell• NSIcom CrE-ME• ChaiVM• HotSpot• AegisVM• Apache Harmony• CACAO• Dalvik• IcedTea

• IKVM.NET• Jamiga• JamVM• Jaos• JC• Jelatine JVM• JESSICA• Jikes RVM• Jnode• JOP• Juice• Jupiter• JX• Kaffe• leJOS• Mika VM• Mysaifu• NanoVM• SableVM• Squawk virtual machine• SuperWaba• TinyVM• VMkit of Low Level Virtual Machine• Wonka VM• Xam

5

Page 6: JVM Magic

HotSpot Virtual Machine

• Developed by Longview Technologies back in 1999

• Contains:• Class loader• Bytecode interpreter• 2 Virtual machines• 7 Garbage collectors• 2 Compilers• Runtime libraries

Page 7: JVM Magic

HotSpot Virtual Machine

• Configured by hundreds of –XX flags

• Reminder• -X options are non-standard• -XX options have specific system

requirements for correct operations• Both are subject to change without

notice

Page 8: JVM Magic

GC Magic 101

Page 9: JVM Magic

GC Is Slow?

• GC has bad performance reputation• Reduces throughput• Introduces pauses• Unpredictable• Uncontrolled• Performance degradation is proportional to

objects count• Just give me the damn free() and malloc()!

I’ll be just fine!• Is it so?

Page 10: JVM Magic

Generational Collectors

• Weak generational hypothesis• Most objects die young (AKA Infant mortality)• Few old to young references

• Generations: regions holding objects of different ages• GC is done separately once a generation fills• Different GC algorithms• The young (nursery) generation

• Collected by “Minor garbage collection”• The old (tenured) generation

• Collected by “Minor garbage collection”

Page 11: JVM Magic

GC Magic 101

• Young is better than Tenured• Let your objects die in young

generation• When possible and makes sense

11

vs

Page 12: JVM Magic

GC Magic 101

12

vs

• Swapping is bad• Application's memory footprint should

not exceed the available physical memory

Page 13: JVM Magic

GC Magic 101

13

vs

• Choose:• Throughput (client)• Low-pause (server)

Page 14: JVM Magic

GC Magic 101

http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html

14

Page 15: JVM Magic

Garbage First (G1)

• New in JDK 1.6 u14 (May 29th)• All memory is divided to 1MB buckets• Calculates objects liveness in buckets

• Drops “dead” buckets• If a bucket is not total garbage, it’s not dropped

• Collects the most garbage buckets first

• Pauses only on “mark”• No sweep

• User can provide pause time goals• Actual seconds or Percentage of runtime• G1 records bucket collection time and can estimate how

many buckets to collect during pause

Page 16: JVM Magic

Garbage First (G1)

• Targets multi-process machines and large heaps

• G1 will be the long-term replacement for the CMS collector• Unlike CMS, compacts to battle fragmentation

• A bucket’s space is fully reclaimed• Better throughput• Predictable pauses (high probability)• Garbage left in buckets with high live ratio

• May be collected later

Page 17: JVM Magic

GC Is Slow? – The Answers

• Reduces throughput• You choose

• Introduces pauses• You choose

• Unpredictable• Not any more

• Uncontrolled• Configurable

• Performance degradation is proportional to objects count• Not true

• Just give me the damn free() and malloc()! I’ll be just fine!• Bad idea (see more later)

Page 18: JVM Magic

General Optimizations

Page 19: JVM Magic

HotSpot Optimizations

• JIT Compilation• Compiler Optimizations• Generates more performant code that

you could write in native• Adaptive Optimization• Split Time Verification• Class Data Sharing

Page 20: JVM Magic

Two Virtual Machines?

• Client VM• Reducing start-up time and memory footprint• -client CL flag

• Server VM• Maximum program execution speed• -server CL flag

• Auto-detection• Server: >1 CPUs & >=2GB of physical memory• Win32 – always detected as client• Many 64bit OSes don’t have client VMs

47

Page 21: JVM Magic

Just-In-Time Compilation

• Everyone knows about JIT!• Hot code is compiled to native• What is “hot”?

• Server VM – 10000 invocations• Client VM – 1500 invocations• Use -XX:CompileThreshold=# to change

• More invocations – better optimizations• Less invocations – shorter warmup time

Page 22: JVM Magic

Just-In-Time Compilation

• The code is being optimized by the compiler• Coming soon…

Page 23: JVM Magic

Adaptive Optimization

• Allows HotSpot to uncompile previously compiled code

• Much more aggressive, even speculative optimizations may be performed

• And rolled back if something goes wrong or new data gathered• E.g. classloading might invalidate inlining

Page 24: JVM Magic

Split Time Verification

• Java suffers from long boot time• One of the reasons is bytecode

verification• Valid flow control• Type safety• Visibility

• In order to ease on the weak KVM, J2ME started performing part of the verification in compile time

• It’s good, so now it’s in Java SE 6 too

Page 25: JVM Magic

Class Data Sharing

• Helps improve startup time• During JDK installation part of

rt.jar is preloaded into shared memory file which is attached in runtime

• No need to reload and reverify those classes every time

Page 26: JVM Magic

Compiler Optimizations

Page 27: JVM Magic

Two Types of Optimizations

• Java has two compilers:• javac bytecode compiler• HotSpot VM JIT compiler

• Both implement similar optimizations

• Bytecode compiler is limited• Dynamic linking• Can apply only static optimizations

Page 28: JVM Magic

Warning

• Caution! Don’t try this at home yourself!

• The source code you are about to see is not real!• It’s pseudo assembly code

• Don’t write such code!• Source code should be

readable and object-oriented• Bytecode will become

performant automagically

55

Page 29: JVM Magic

Optimization Rules

• Make the common case fast• Don't worry about uncommon/infrequent

case• Defer optimization decisions

• Until you have data• Revisit decisions if data warrants

56

Page 30: JVM Magic

Null check Elimination

• Java is null-safe language• Pointer can’t point to

meaningless portion of memory• Null checks are added by the

compiler, NullPointerException is thrown

• JVM’s profiler can eliminate those checks

57

Page 31: JVM Magic

Example – Original Source

58

1 public class Game { 2 3 private Logger logger; 4 private int totalScore; 5 6 public void score(String player, int points) { 7 logger.info(player + " scores " + points); 8 totalScore += points; 9 } 10 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 game.score("Bob", 12); 6 } 7 }

Page 32: JVM Magic

Example – Null Check Elimination

59

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 if(game == null) throw new NullPointerException("null"); 6 game.score("Bob", 12); 7 } 8 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 if(game == null) throw new NullPointerException("null"); 6 game.score("Bob", 12); 7 } 8 }

Page 33: JVM Magic

Inlining

• Love Encapsulation?• Getters and setters

• Love clean and simple code?• Small methods

• Use static code analysis?• Small methods

• No penalty for using those!• JIT brings the implementation of these

methods into a containing method• This optimization known as “Inlining”

Page 34: JVM Magic

Inlining

• Not just about eliminating call overhead• Provides optimizer with bigger blocks• Enables other optimizations

• hoisting, dead code elimination, code motion, strength reduction

61

Page 35: JVM Magic

Inlining

• But wait, all public non-final methods in Java are virtual!• HotSpot examines the exact case in place• In most cases there is only one implementation,

which can be inlined• But wait, more implementations may be

loaded later!• In such case HotSpot undoes the inlining• Speculative inlining

• By default limited to 35 bytes of bytecode• Use -XX:MaxInlineSize=# to change

Page 36: JVM Magic

Example - Inlining

63

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 game.score("Bob", 12); 6 game.logger.info("Bob" + " scores " + 12); 7 game.totalScore += 12; 8 } 9 }

Page 37: JVM Magic

Example – Source Code Revision

64

1 public class GameMove { 2 3 private final String player; 4 private final int points; 5 6 public GameMove(String player, int points) { 7 this.player = player; 8 this.points = points; 9 } 10 11 public String getPlayer() { 12 return player; 13 } 14 15 public int getPoints() { 16 return points; 17 } 18 }

Page 38: JVM Magic

Example – Source Code Revision

65

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 for (GameMove gameMove : gameMoves) { 7 game.score(gameMove.getPlayer(), gameMove.getPoints()); 8 } 9 } 10 }

Page 39: JVM Magic

Code Hoisting

• Hoist = to raise or lift• Size optimization• Eliminate duplicate code in

method bodies by hoisting expressions or statements• Duplicate bytecode, not necessarily

source code

Page 40: JVM Magic

Example – Code Hoisting

67

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 for (int i = 0; i < gameMoves.length; i++) { 7 if (i < 0 || gameMoves.length <= i) throw new ArrayIndexOutOfBoundsException(); 8 game.score(gameMoves[i].getPlayer(), gameMoves[i].getPoints()); 9 } 10 } 11 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 int length = gameMoves.length; 7 for (int i = 0; i < length; i++) { 8 if (i < 0 || length <= i) throw new ArrayIndexOutOfBoundsException(); 9 game.score(gameMoves[i].getPlayer(), gameMoves[i].getPoints()); 10 } 11 } 12 }

Page 41: JVM Magic

Bounds Check Elimination

• Java promises automatic boundary checks for arrays• Exception is thrown

• If programmer checks the boundaries of its array by himself, the automatic check can be turned off

Page 42: JVM Magic

Example – Bounds Check Elimination

69

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 int length = gameMoves.length; 7 for (int i = 0; i < length; i++) { 8 if (i < 0 || length <= i) throw new ArrayIndexOutOfBoundsException(); 9 game.score(gameMoves[i].getPlayer(), gameMoves[i].getPoints()); 10 } 11 } 12 }

Page 43: JVM Magic

Sub-Expression Elimination

• Avoids redundant memory access

70

1 @Test 2 public void testScore() { 3 Game game = new Game(); 4 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 5 for (int i = 0; i < gameMoves.length; i++) { 6 game.score(gameMoves[i].getPlayer(), gameMoves[i].getPoints()); 7 } 8 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 for (int i = 0; i < gameMoves.length; i++) { 7 GameMove move = gameMoves[i]; 8 game.score(move.getPlayer(), move.getPoints()); 9 } 10 } 11 }

Page 44: JVM Magic

Loop Unrolling

• Some loops shouldn’t be loops• In performance meaning, not code

readability• Those can be unrolled to set of

statements• If the boundaries are dynamic,

partial unroll will occur

Page 45: JVM Magic

Example – Loop Unrolling

72

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 for (int i = 0; i < gameMoves.length; i++) { 7 GameMove move = gameMoves[i]; 8 game.score(move.getPlayer(), move.getPoints()); 9 } 10 } 11 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 GameMove move = gameMoves[0]; 7 game.score(move.getPlayer(), move.getPoints()); 8 move = gameMoves[1]; 9 game.score(move.getPlayer(), move.getPoints()); 10 } 11 }

Page 46: JVM Magic

Example – Inlining

73

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 GameMove move = gameMoves[0]; 7 game.score(move.getPlayer(), move.getPoints()); 8 game.score(move.player, move.points); 9 move = gameMoves[1]; 10 game.score(move.getPlayer(), move.getPoints()); 11 game.score(move.player, move.points); 12 } 13 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 GameMove[] gameMoves = {new GameMove("bob", 2), new GameMove("robert", 3)}; 6 GameMove move = gameMoves[0]; 7 game.score(move.player, move.points); 8 game.logger.info(move.player + " scores " + move.points); 9 game.totalScore += move.points; 10 move = gameMoves[1]; 11 game.score(move.player, move.points); 12 game.logger.info(move.player + " scores " + move.points); 13 game.totalScore += move.points; 14 } 15 }

Page 47: JVM Magic

Escape Analysis

• Escape analysis is not optimization

• It is check for object not escaping local scope• E.g. created in private method, assigned

to local variable and not returned• Escape analysis opens up

possibilities for lots of optimizations

Page 48: JVM Magic

Scalar Replacement

• Remember the rule “new == always new object”?• False!

• JVM can optimize away allocations• Fields are hoisted into registers

• Object becomes unneeded• But object creation is cheap!

• Yap, but GC is not so cheap…

75

Page 49: JVM Magic

Example – Source Code Revision

76

1 public class Moves { 2 3 GameMove nextMove(){ 4 return new GameMove(nextPlayer(), calcPoints()); 5 } 6 7 private int calcPoints() {…} 8 9 private String nextPlayer() {…} 10 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 Moves moves = new Moves(); 6 GameMove move = moves.nextMove(); 7 game.score(move.getPlayer(), move.getPoints()); 8 } 9 }

Page 50: JVM Magic

Example – Scalar Replacement

77

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 Moves moves = new Moves(); 6 GameMove move = moves.nextMove(); 7 GameMove move = new GameMove(moves.nextPlayer(), moves.calcPoints()); 8 game.score(move.getPlayer(), move.getPoints()); 9 } 10 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 Moves moves = new Moves(); 6 GameMove move = new GameMove(moves.nextPlayer(), moves.calcPoints()); 7 String player = moves.nextPlayer(); 8 int points = moves.calcPoints(); 9 game.score(move.getPlayer(), move.getPoints()); 10 game.score(player, points); 11 } 12 }

Page 51: JVM Magic

Example – Scalar Replacement

78

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 Moves moves = new Moves(); 6 String player = moves.nextPlayer(); 7 int points = moves.calcPoints(); 8 game.score(player, points); 9 game.score(moves.nextPlayer(), moves.calcPoints()); 10 } 11 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 Moves moves = new Moves(); 6 game.score(moves.nextPlayer(), moves.calcPoints()); 7 int points = moves.calcPoints(); 8 game.logger.info(moves.nextPlayer() + " scores " + points); 9 game.totalScore += points; 10 } 11 }

Page 52: JVM Magic

Lock Coarsening

• HotSpot merges adjacent synchronized blocks using the same lock

• The compiler is allowed to moved statements into merged coarse blocks

• Tradeoff performance and responsiveness• Reduces instruction count• But locks are held longer

1 private void addPresentor 2 (StringBuffer lecture, String presentor) { 3 lecture 4 .append( " by ") 5 .append(presentor); 6 }

Page 53: JVM Magic

Example – Source Code Revision

80

1 public class Game { 2 3 Logger logger; 4 int totalScore; 5 6 public void score(String player, int points) { 7 logger.info(player + " scores " + points); 8 totalScore += points; 9 } 10 11 public synchronized void multithreadedScore(String player, int points){ 12 score(player, points); 13 } 14 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 game.multithreadedScore("Bob", 5); 6 game.multithreadedScore("Jane", 7); 7 game.multithreadedScore("Dwane", 1); 8 } 9 }

Page 54: JVM Magic

Example – Lock Coarsening

81

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 lock(game); 6 game.multithreadedScore("Bob", 5); 7 unlock(game); 8 lock(game); 9 game.multithreadedScore("Jane", 7); 10 unlock(game); 11 lock(game); 12 game.multithreadedScore("Dwane", 1); 13 unlock(game); 14 } 15 }

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 lock(game); 6 game.multithreadedScore("Bob", 5); 7 game.logger.info("Bob" + " scores " + 5); 8 game.totalScore += 5; 9 game.multithreadedScore("Jane", 7); 10 game.logger.info("Jane" + " scores " + 7); 11 game.totalScore += 7; 12 game.multithreadedScore("Dwane", 1); 13 game.logger.info("Dwane" + " scores " + 1); 14 game.totalScore += 1; 15 unlock(game); 16 } 17 }

Page 55: JVM Magic

Lock Elision

• A thread enters a lock that no other thread will synchronize on• Synchronization has no effect• Can be deducted using escape analysis

• Such locks can be elided• Elides 4 StringBuffer

synchronized calls: 1 private String getLecture(String topic, String presentor) { 2 return new StringBuffer() 3 .append(topic) 4 .append( " by ") 5 .append(presentor) 6 .toString(); 7 }

Page 56: JVM Magic

Example - Lock Elision

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 lock(game); 6 game.logger.info("Bob" + " scores " + 5); 7 game.totalScore += 5; 8 game.logger.info("Jane" + " scores " + 7); 9 game.totalScore += 7; 10 game.logger.info("Dwane" + " scores " + 1); 11 game.totalScore += 1; 12 unlock(game); 13 } 14 }

Page 57: JVM Magic

Constants Folding

• Trivial optimization• How many constants are there?

• More than you think!• Inlining generates constants• Unrolling generates constants• Escape analysis generates constants

• JIT determines what is constant in runtime• Whatever doesn’t change

Page 58: JVM Magic

Constants Folding

• Literals folding• Before: int foo = 9*10;• After: int foo = 90;

• String folding or StringBuilder-ing• Before: String foo = "hi Joe " + (9*10);

• After: String foo = new StringBuilder().append("hi Joe ").append(9 * 10).toString();

• After: String foo = "hi Joe 90";

Page 59: JVM Magic

Example – Constants Folding

86

1 public class GameTest { 2 @Test 3 public void testScore() { 4 Game game = new Game(); 5 game.logger.info("Bob" + " scores " + 5); 6 game.logger.info("Bob scores 5"); 7 game.totalScore += 5; 8 game.logger.info("Jane" + " scores " + 7); 9 game.logger.info("Jane scores 7"); 10 game.totalScore += 7; 11 game.logger.info("Dwane" + " scores " + 1); 12 game.logger.info("Dwane scores 1"); 13 game.totalScore += 1; 14 } 15 }

Page 60: JVM Magic

Dead Code Elimination

• Dead code - code that has no effect on the outcome of the program execution public static void main(String[] args) {

long start = System.nanoTime();

int result = 0;

for (int i = 0; i < 10 * 1000 * 1000; i++) {

result += Math.sqrt(i);

}

long duration = (System.nanoTime() - start) / 1000000;

System.out.format("Test duration: %d (ms) %n", duration);

}

Page 61: JVM Magic

OSR - On Stack Replacement

• Normally code is switched from interpretation to native in heap context• Before entering method

• OSR - switch from interpretation to compiled code in local context• In the middle of a method call• JVM tracks code block execution count

• Less optimizations• May prevent bound check elimination and

loop unrolling

Page 62: JVM Magic

0Read a to

stack

1Increment

Store a to heap

1

Read a to stack

1

Add 8 9

Store a to heap

9

0

0

1

1

1

9

HeapStack

Out-Of-Order Execution

1 public class OutOfOrder { 2 3 private int a; 4 5 public void foo(){ 6 a++; 7 a+=8; 8 } 9 10 }

Page 63: JVM Magic

1 public class OutOfOrder { 2 3 private int a; 4 5 public void foo(){ 6 a++; 7 a+=8; 8 } 9 10 }

Out-Of-Order Execution

0Read a to

stack

1Increment

Add 8 9

Store a to heap

9

0

0

0

9

HeapStack

Page 64: JVM Magic

Programming & Tuning Tips

• 91

Page 65: JVM Magic

How Can I Help?

• Just write good quality Java code• Object Orientation• Polymorphism• Abstraction• Encapsulation• DRY• KISS

• Let the HotSpot optimize

Page 66: JVM Magic

93

How Can I Help?

• final keyword• For fields:

• Allows caching• Allows lock coarsening

• For methods:• Simplifies Inlining decisions

• Immutable objects die younger

Page 67: JVM Magic

JVM tuning tips

• Reminder: -XX options are non standard• Added for HotSpot development purposes• Mostly tested on Solaris 10• Platform dependent

• Some options may contradict each other

• Know and experiment with these options

94

Page 68: JVM Magic

Monitoring & Troubleshooting

Option Comment

HeapDumpOnOutOfMemoryError Since 5.0u7, HPROF

HeapDumpOnCtrlBreak Since 5.0u14, HPROF

OnError Runs list of user defined command on fatal error

OnOutOfMemoryError Runs list of user defined command on 1st out of memory error

PrintClassHistogram Print a histogram of class instances on Ctrl-Break (jmap –histo)

PrintConcurrentLocks Print java.util.concurrent locks on Ctrl-Break (jstack –l)

PrintCompilation Traces compiled methods

PrintInlining Traces inlining

PrintOptoAssembly Traces the generated Assemlby

TraceClassLoading/Unloading Traces class loading/unloading

95

Page 70: JVM Magic

References

• JavaOne 2009 Sessions:• Garbage Collection Tuning in the Java

HotSpot™ Virtual Machine• Under the Hood: Inside a High-

Performance JVM™ Machine• Practical Lessons in Memory Analysis• Debugging Your Production JVM™

Machine• Inside Out: A Modern Virtual Machine

Revealed

97

Page 71: JVM Magic

Thank you for your attention

Thanks to Ori Dar!