®
IBM Software Group
© 2011 IBM Corporation
Java CompilationFrom Top to Bottom
Mike Kucera – IBM RationalMarch 11, 2011
2
Innovation for a smarter planet
Compiling Java - 10,000 Foot View
Write and Debug Java code in an IDE (eclipse)
Compile Java source into bytecode (class files)
Run the bytecode on any JVM on any platform
At runtime JIT compile the bytecode into native code for performance
3
Innovation for a smarter planet
IBM and Java
IBM has over 3000 products based on Java.
IBM sells hardware (PowerPC and SystemZ) and these platforms must support Java applications.
IBM Java is optimized to run IBM software, especially Websphere Application Server.
4
Innovation for a smarter planet
IBM and Java
IBM Java supports 12 different platforms and many other embedded space platformsRe-use is the only way to scale
Challenging to do things right across all platforms
If there is a bug… it will be found
Java is developed across multiple development sites
Code straddles the boundary of research and production
IBM develops tools for Java developers (based on Eclipse)RAD – Rational Application Developer
5
Innovation for a smarter planet
Worldwide Java Development Team
TorontoDynamic/Static compilationXML parsing
OttawaJ9 JVMEclipse IDEJ2ME libraries
HursleyJ2SE libraries and CORBAJ2SE integration and deliveryCustomer service
BangaloreIntegration testingCustomer serviceField release development
ShanghaiGlobalizationSpecialized testing
PhoenixJ2ME developmentJ2ME delivery
AustinJava and XML securityAIX system testPowerPC specialists
Poughkeepsiez/OS system testS/390 specialists
RochesteriSeries development
7
Innovation for a smarter planet
What is an IDE?
IDE - Integrated Development Environment Powerful editor for writing your programs Makes writing software faster and easier
Increased developer productivity
Understands your codeNot just a text editorParses and analyzes the code
Provides an integrated environment for all your toolsVersion Control (SVN, CVS, Jazz, etc..)DebuggersPerformance EngineeringDocumentation ToolsDatabasesEtc…
8
Innovation for a smarter planet
Writing Code using an IDE Modern Java IDEs have many advanced code editing features
Instant Feedback Detect syntax errors as you type.
Code Navigation Instantly jump from a method call to the method definition
Refactoring Rename a method and the IDE will find everywhere the method is called and
rename all the calls.Code Completion
Start typing and the IDE finishes it for you.Visualizations
View a type hierarchy View the structure outline of a class.
Quick Assist Automatically fix coding errors for you.
And many more....
9
Innovation for a smarter planet
ECJ – Eclipse Compiler for Java
At the core of eclipse there is a Java compiler.Designed with the needs of an IDE in mind.
The compiler has three outputs:Generate ASTsGenerate bytecode (class files)Generate an on-disk index file
ASTs can be used directly by some featureseg) Refactoring
Index is used for fast lookup of program elements.eg) Code navigation, Search, Generate Type Hierarchy
Compiler is designed to support recompilation while debugging Incremental compilation
10
Innovation for a smarter planet
Incremental Compilation
An incremental compiler will only recompile the parts of the code that have changed.
Avoid wasteful recompilation of unchanged parts.Reduces the granularity of a language's translation units.
ECJ will only recompile files that have changed.A standard C compiler will compile all the header files included by a source file.The standard javac compiler is not an incremental compiler.
Very important for productivity.Long compilation pauses are unacceptable.The developer needs to be able to recompile code changes very quickly.
11
Innovation for a smarter planet
Parsing Parse the code in the editor.
Supports different versions of Java.Parser runs whenever the user stops typing for a few seconds. Instantly reports syntax errors and warnings.
Parser generated from an LALR parser generator.Grammar file contains grammar rules in BNF form.Most rules have actions associated with them.Actions build the AST in a bottom up fashion
Leaf nodes created first. Last node to be created is the root.
Unique challengesSyntax error recovery needs to be really good.Parse unsaved code in the editor.Content assist.Can't desugar.
12
Innovation for a smarter planet
Content Assist The IDE will complete the code for you.
Problem: user hasn't finished typing a full statement yet, therefore there is a syntax error at the insertion point.
Must recover from the error and compute a list of possible completions.
13
Innovation for a smarter planet
Refactoring
Transforming code into a new form that behaves the same as before but is structured better.
RenameExtract local variable Inline expression Inline methodExtract superclassExtract interfaceChange method signatureEtc...
Refactorings are performed on the AST with the help of the index.
Rewrite rules
14
Innovation for a smarter planet
Desugaring
Syntactic SugarSyntax that is equivalent to some other syntax
in the language but is more convenient or compact.
i++; i += 1; i = i + 1;
DesugaringThe parser produces the same AST fragment
for different syntax.Convenient for code generation.
AST produced by IDE cannot be desugared.The AST needs to represent exactly what is in
the user's source.All source offsets must be preserved.Comments must be preserved.
15
Innovation for a smarter planet
AST
Eclipse actually has two separate ASTs for Java. “Internal” AST
May be desugared and extended by the parser. Used to resolve compilation problems, perform type checking and generate
bytecode. Example:
– In Java if you do not provide a constructor the compiler will provide a default constructor for you.
– This is implemented by adding a constructor node under a class node. “DOM” AST
Exactly represents the user's source code, no desugaring. Generated from the internal AST.
– “Cleaned up” Used for code completion, refactoring, and generating the index. Example:
– The default constructor node is filtered out because it does not actually exist in the source.
16
Innovation for a smarter planet
Bytecode Generation
Each AST node has a generateCode() method.
Code generation is doneby a depth-first traversalof the AST.
Each generateCode() method first calls generateCode() on its children then generates code for itself.
This works because the JVM is a stack machine.
18
Innovation for a smarter planet
Dynamic Class Loading
Static languages have a linking step after compilation.
Java uses Dynamic Class LoadingAll classes are resolved at runtime.The first time a class name is encountered it is loaded by the JVM.
Searches the “classpath” for the class file to load.
Advantages:Reflection
load and use classes at runtime that were not known to the compiler.Hotswap :)
Make code changes as you are debugging. Incremental compiler recompiles the class file, unloads the old version of the class
and loads the new one. Change the behaviour of the program while it is running without needing to restart.
Creates many challenges for the JIT compilerSome optimizations are performed based on assumptions. A class may be loaded at any time that invalidates these assumptions and requires the
optimization to be backed out.
19
Innovation for a smarter planet
JIT Compilation
Also known as Dynamic Compilation
Java bytecode is compiled into native machine code while the application is running.
Results in ~10x speed improvement over pure interpretation.
Compilation overhead is a runtime costThere must be a payoffThe resulting speedup must outweigh the cost of compiling the method.Only compile the “hottest” methods.
Granularity:Method based JIT – compilation unit is a methodTracing JIT – compilation unit is a basic block IBM Java JIT compiler is method based
20
Innovation for a smarter planet
JIT Compilation Control
A sampling thread wakes up every X milliseconds and records all the methods that are currently executing.
When a method reaches some threshold it is queued for native code compilation.
The method is initially compiled at a low optimization level.More optimizations increases compilation overhead.
The jitted version of the method is used on subsequent callsNote, the interpreted version of the method may still be executing somewhere.
If the jitted method is still hot it may get queued up again for compilation at higher optimization levels.
JIT compilation happens in separate threads.Good when you have underutilized cores available.
21
Innovation for a smarter planet
JIT Characteristics
The JIT compiler can optimize for the target CPU and OS where the application is running.
The JIT can detect if certain instruction sets are supported.Knows the size of the data and instruction caches.Knows how many registers are available.
In contrast a static compiler must generate code for the lowest common denominator, or generate code separately for each possible target.
JIT compiler has access to profiling data which it can use when performing optimizations.
Can perform aggressive optimizations based on runtime assumptions.Can back out optimizations if an assumption is invalidated.
22
Innovation for a smarter planet
JIT Limitations
Compilation overhead is a runtime costCertain analyses are impractical to do because they are too slow
Escape analysis is only done at the highest optimization levels Whole program analysis is not done at all.
Jitted code must often branch back into the interpreter.Throwing an exception.Garbage collection points.Resolving references (i.e. triggering class loading).Calling an interpreted method from a jitted method.
23
Innovation for a smarter planet
JIT Characteristics
Compilation overhead is a runtime costCertain analyses are impractical to do because they are too slow
Escape analysis is only done at the highest optimization levels Whole program analysis is not done at all.
Jitted code must often branch back into the interpreter.Throwing an exception.Garbage collection points.Resolving references (ie triggering class loading).Calling an interpreted method from a jitted method.
24
Innovation for a smarter planet
Optimization: Devirtualization Java programs contain many virtual methods.
If a virtual method has no overrides then it may be devirtualized.Observation based on the current state of loaded classes.Removes the overhead of looking up the method implementation.Enables inlining.
Problem: dynamic class loading Its possible that at any time a class may be loaded that contains a method that
overrides a method that was devirtualized.A table of assumptions is maintained. Each assumption has a list of instructions that must be patched if the assumption is
invalidated.Patched method may get queued up for recompilation.
25
Innovation for a smarter planet
Optimization: Patching when assumption invalidated0: no-op1: fast path- call method directly2: more code3: return4: slow path- call virtual method5: branch 2
0: branch 41: fast path- call method directly2: more code3: return4: slow path- call virtual method5: branch 2
0: slow path- call virtual method1: more code2: return
Patch
Recompile