Ahead-Of-Time Compilation of Java Applications

JIT vs. AOT

Ahead-Of-Time Compilationof Java Applications

1

Nikita LipskyExcelsior LLC

Once upon a time,when C++ was new and cool,+++++++++2

Once upon a time,when C++ was new and cool,+++++++++3

Once upon a time,when C++ was new and cool,all compilers were static4

Once upon a time all compilers were static. C++ was compiled statically, FORTRAN was compiled statically and even Pascal was compiled statically

4

Until into this world came

Java 5

until into this world came... Of course, Java did not invent dynamic compilation but thanx to Java dynamic compilation became a mainstream.

5

Until into this world came

Java 6

until into this world came... Of course, Java did not invent dynamic compilation but thanx to Java dynamic compilation became a mainstream.

6

Two Ways To Execute Java Bytecode

InterpretSlow but portable

Compile to native codeRuns on the actual hardware

7

Today, when we talk about Java, we implicitly mean a JVM and Java bytecode that runs on that JVM. A JVM can execute bytecode in two ways: interpret, that is slow, or compile it into native code, that runs directly on the underlying hardware. However, there are also two moments in time when that compilation into native code can take place..

7

When to compile to native code?At application run timeDynamic (Just-In-Time, JIT) Compilation

Before the application is launched/deployedStatic (Ahead-Of-Time, AOT) Compilation8

Dynamic, or Just-In-Time, compilation occurs at application run time, whereas static compilation, also known as Ahead-Of-Time, takes place before application execution. Let's refer to the next slide.

8

9

Above the dotted line is what happens before execution, and underneath it is what happens during execution. The left diagram depicts a conventional JVM such as Oracle HotSpot. Your source files are translated into class files and those class files are turned into jar files, which are deployed to the target systems. During execution, the JVM loads the class files from the jars and initially interprets the methods that receive control. Then, with the help of a profiler, it detects often executed code and feeds that "hot" code into a dynamic compiler, which generates native code that runs directly on the underlying hardware. In the middle, you can see a JVM with an Ahead-Of-Time compiler. All class files get converted into machine code before deployment. As a result, your program runs directly on hardware right from the start. In the more general case, however, your application may load classes that are unknown at the moment of Ahead-Of-Time compilation - dynamic proxies, third-party plugins and so on. Therefore even an AOT-enabled JVM must be equipped with an interpreter or a dynamic compiler that would handle classes that may appear only at run time.

9

Static compilation of JavaIs it at all possible?If yes, what are the conditions and would it still be Java?What are the benefits?Obfuscation vs. static compilationApplication startupSmaller installers with no external dependenciesPerformance: comparison with JIT compilationWhy is AOT better for Desktop, Embedded, MobileAre there any benefits for server-side apps?

10

With this presentation, I will try to achieve two goals. First, I want to prove to you that static compilation of Java applications is indeed technically possible, and, second, I want to show when it can be useful. 10

Nikita Lipsky20+ years in software developmentExcelsior JET project initiator16+ years of contributionscompiler engineer team leadproduct leadetc. Open source projects WebFX and Java ReStart in a spare timeTwitter: @pjBooms11

First, let me introduce myself. For nearly two decades, I have been working on the Excelsior JET project, which is a complete Java SE implementation with an AOT compiler. I worked on the project from day one and contributed to all components. 11

12

On the slide you can see a Java compatible sign. Actually we sweated blood to get this sign. Sun Microsystems before and Oracle now do not give this sign to everybody. This sign, in particular, means that our implementation is a compatible Java SE implementation and thus the static Java compilation is possible in conformance with the Java SE specification.

12

Who needs Java AOT

13 Excelsior JET? Excelsior JET?

This slide proves that Java AOT compilation can be needed. It is our customers only a few of them.13

Java AOT Compilation Myths

14

Since the inception of Java, numerous myths have developed around Java static compilation. I would like to start my presentation with dispelling the myths.

14

Myth 1. Java is too dynamicReflection

Dynamic class loading

15

The 1st myth: Java is too dynamic. The two Java features - reflection and dynamic class loading make that static compilation of Java is impossible by nature. But let's take a closer look at these features. What, essentially, is reflection? That is some meta-data of your program - class names, method signatures, and so on. A static compiler can just place that information into the data section of the resulting executable file, for the runtime routines to use that data to enable reflection. In other words, reflection just works for statically compiled code.As far as dynamic class loading, yes, a completely new class can appear at run time. It may be downloaded from the Internet, generated by a framework, etc. However, I would like to point your attention to the fact that 99,9% of your application classes are normally known before execution. So nothing prevents you from compiling those 99,9% to native code statically and let a dynamic compiler handle the remaining one tenth % dynamically. Therefore dynamic class loading doesn't prevent you from compiling the absolute majority of your application's classes statically.

15

Myth 2. AOT kills WORAWORA

Write OnceRun Anywhere

BORA

Build Once Run Anywhere16!=

The next myth sounds as follows: Static compilation kills the main Java principle, WORA: Write Once Run Anywhere. However, the principle sounds as Write Once, not Build Once. That is to say, you may write your application once, build it for every platform you need and run it everywhere. Moreover, many Java desktop application vendors already distribute their applications as platform-dependent installation packages. For those vendors, switching over to static native compilation is just an additional build step in their build process. And if you are on the server side and use Docker for instance please note that Docker image is platform specific as well.

16

Myth 3. AOT = Small EXE

I would get a small executable that works without a JVM (as if I wrote my app inC)

17

The next myth is: If Java had a static compiler that would transform Java into ++. If that were true, I could for example turn my 100Kb jar into a 100Kb executable that I could then send to my friend who could run it without having to install Java. 17

Myth 3. AOT = Small EXEWhat about thousands of standard classes?

Who will collect the garbage?

What about reflection?18

Ok, but Java comes with thousands of standard library classes that a Java application may use, how would they appear on my friends PC? And who will collect the garbage? What about reflection?What I would like to say?Both C and Java have runtime, but C has a small runtime and Java has a big one. The C runtime can only allocate and de-allocate the memory manually, and its standard library is very small, like stdlib and stdio. Java has way more features and its runtime is much bigger.

18

AOT = Smaller Package

Yes, AOT compilation can help you create aJREindependent installer smaller than theJRE installer alone

Jav Runtime Slim-Down: www.excelsiorjet.com/solutions/java-download-size 19

Nevertheless, you can compile standard Java classes along with your application classes linked with the runtime support such as GC into one executable file. The executable file won't be small, but it will be smaller than the whole Java runtime, because the static compiler can include not all platform classes into the executable, but only those that are used in your application. It helps reducing the size of the resutling installer to a great extent.

19

Java IS too dynamic

20

Previously I have been trying to convince you that Java's dynamic class loading doesn't constitute a problem for static compilers. If I succeeded, congratulations - I have cheated on you. There is one feature in Java that causes a big headache for static compiler writers. I am talking about custom class loaders.

20

Non-standard Class LoadersOverride the default reference resolution logicUnique namespacesDependency management frameworks OSGi, Java Module System, etc. Solve JAR hell problemJava EE servers, Eclipse RCP, plugin architectures

21

You can create your own class loader in your Java application at any moment, overriding the standard class loading logic. Then, class reference resolution between your applications classes will function as you define. Why custom class loader are used today?They used in Java EE, where multiple web applications can be deployed on one server. The server loads each web application via a separate class loader to avoid conflicts between apps. Class loaders are also used for managing dependencies in frameworks like OSGi used by Eclipse Platform.With all those modern frameworks, there are entire clusters of applications which have the majority of their classes known before execution, but all those classes are loaded by custom class loaders, not by the standard class loaders included in the Java platform.But how can we compile these classes statically? A compiler has no way to know how to resolve class references in such applications.

21

Non-standard Class LoadersHow to compile such classes statically?Compile each class in isolationBad for performance

Reproduce reference resolution logic ofpopular class loadersDoes not work for arbitrary class loaders22

First, the compiler may give up to resolve references between classes and compile each class independently from other classes. It would work, but won't be good for performance. To make a Java application perform fast, we need to resolve class references to enable inline code substitution between classes.Fortunately, there is another solution, albeit partial. Most developers don't write their own class loaders. They use readymade frameworks that employ custom class loaders internally. Thus we can learn how class loaders work in the most popular frameworks and support these class loaders at the JVM level as if they were the standard class loaders. I am going to demonstrate you how to do that.

22

Non-standard Class Loaders

23ClassesClasses

Let's say we have a Java application, and the majority of its classes are known before the execution. But all classes are loaded by some custom class loaders.

23

Non-standard Class Loaders24CLi: class loader

Slide. Firstly, the compiler should specify how many and what class loaders will be created at run time. How can that be done? By inspecting the structure of the application! For example, if we are compiling an Eclipse RCP applications where every OSGI bundle is loaded by its own class loader we can scan plugins/ folder to learn how many class loaders will be created at run time. Then, we should give a unique name to each class loader, as the JVM can load several classes with the same name, provided each of them is loaded by its own class loader. We need to add, say, a prefix of the class loader unique identifier to every class name to differentiate the classes. For Eclipse RCP application, we can use plug-in names as class loader names because they are unique.

24

Non-standard Class Loaders25

Then we need to distribute application classes among the class loaders

25


And after that we may resolve references between classes, both within each class loader

26


and across class loader boundaries.

27



Non-standard Class LoadersAOT class loader support scheme:Reference resolver in the AOT compiler:Determines which class loaders will be created at run time and assigns a static ID to eachBreaks down the set of classes by class loader IDsResolves references between classes

30

However, as you can imagine, compilation is just half the battle. We also need to load the classes at run time.

30

Non-standard Class LoadersAOT class loader support scheme contd.:At application run time:For each class loader instance, compute its IDKnowing the ID, load the precompiled classes:Create a java.lang.Class instanceFill it with reflection informationLet the O/S load code from the executable

31

Let's imagine that at some moment a certain class loader wants to load a class with a certain name. Sooner or later, it will inform the JVM about its desire, as a class loader can't itself load a class into the JVM, it can only look up classes, delegate class loading requests to other class loaders, and ask the JVM to load a class it has found. At that moment, drawing on our knowledge of how do the particular class loaders work, we need to match the class loaders that we identified at compile time against those that appeared at run time. In other words, we need to derive that unique static identifier/name of the class loader from its run time instance. For instance, if we deal with Eclipse RCP, where we name class loaders by plug-in names, we can find the plug-in name in a class loader instance field at run time Then if we have the name of a class and its class loader identifier, we can find that class in the executable and load it. Note that loading of statically compiled classes differs from class loading in a classic JVM. All that we have to do to load a pre-compiled class is to create an instance of java.lang.Class and fill it with reflection information. Linking with other classes was performed ahead-of-time. Loading of the code will be done by the operating system that will run directly on the hardware.

31

Non-standard Class LoadersAOT class loader support scheme contd.:At application run time:For each class loader instance, compute its IDKnowing the ID, load the precompiled classes

Known to work for Eclipse RCPand Tomcat Web apps

32

Speaking about the practical applications of this theory, there is support for Apache Tomcat class loaders in Excelsior JET. This means it can statically compile Tomcat, together with all deployed Web applications and all the frameworks such as Spring, Hibernate, etc. There is also support for Eclipse RCP class loaders, so yes, you can compile the Eclipse IDE.

I would like to emphasize again that we are not trying to compile 100% of classes. If proxy classes, Java accessors are generated at application run time, or if Spring generates something, all those dynamically generated classes may be processed by the dynamic compiler.

Hence, we can claim that for every Java application that has most or all of its classes and their respective class loaders known before execution, we can build a static compiler that will efficiently compile all those classes down to an optimized native binary. So is static Java compilation possible? Yes, for sure

32

Why AOT for Java?

33

Ok, so AOT for Java is possible but why would we need it? How can it be useful? Let's start from the obvious things.

33

Protect Code from Decompilers

34

It is well known that Java bytecode can be easily reverse-engineered with the help of decompilers. Java decompilers are free, easy to use and produce source code that is not just readable, but is very close to the original. (Bytecodeviewer.com enables you to run not less than five decompilers at once and pick the best result.) So if you don't want to distribute your applications in almost the source code form, you would use a tool that would hinder the decompilation of your code.

34

Application Code ProtectionBytecode emitted by javac is extremely easy to decompile almost to original sourceProof: http://bytecodeviewer.com/Reflection makes name obfuscation labor-consuming and error prone, hinders code maintenancePossible to guess what obfuscated code doesReferences to JDK classes remain intact35

The to-go solution for this problem is obfuscation, and the simplest form of it is name obfuscation. A name obfuscator replaces all class, field, and method names with nonsense so resulting decompiler output becomes harder to understand. But name obfuscation have one big disadvantage: if your code uses the Reflection API, or a library/framework that you employ relies on reflection, you'd have to write exclude lists for the obfuscator, telling it not to change the names of classes, fields and methods that may be accessed via reflection. It is a difficult and time-consuming process prone to errors. Moreover, what are you going to do with your code in the future? Sooner or later, you will re-factor it, possibly renaming packages, classes, fields, and/or methods, and then you would have to renew the exclude lists as well. If you forget to do that, everything can work well on the developer's PC, but when you deploy the application, it can fail at the end-user side at the most inappropriate moment and you may not even recognize the problem at first sight.Moreover, it is often easy to guess what the obfuscated code does just by glancing at the decompiler output, because all references to JDK classes stay intact. 35

Application Code ProtectionNative machine code can only be effectively disassembled, but not decompiledApplication structure not deductiblefrom disassembler outputAggressively optimized native code only remotely resembles original

36

Why is static compilation better?Slide. First of all, there is no tool that can decompile optimized native code back into Java code. All that you can do with native code is disassembly it. But to read and understand disassembler output you need to have more qualification then to read and understand Java source emitted by a decompiler. It is way more difficult to understand what the code does when you have no symbolic information attached to it. And after aggressive optimizations, after all the inline substitutions and specializations, the resulting native code may only remotely resemble the original.Simply put, a static Java compiler protects your application to the same degree as if you wrote it in C++ in the first place.36

Application Startup Time

37

Next comes the opinion that a statically compiled application should start faster than if run on a conventional JVM, because there is no warm-up cycle involving interpretation, profiling and dynamic compilation. Is that true?

37

Cold Start vs Warm StartWhy a re-started application starts much faster?No need to load app code and data from diskNo need to load system/third-party dynamic libraries (.DLL, .so) either

38

First, you surely have noticed that a large application re-starts much faster than when you run it for the first time. It is true for all applications, whether written in Java, C#, or C++. It so happens because when you start an application for the first time, the operating system has to load a lot of code and data from the media containing the application. On a restart, most of that code and data is still in the disk cache and the application starts faster as a result.

38

AOT Is Faster?

Native code is thicker than Java bytecode

Reading code from disk often takes more time than its execution, esp. on cold starts39

So, can static compilation give us any advantage in startup time? At first, notice that native code is "thicker" than Java bytecode due to its lower-level nature. For example, fetching an array element is a single instruction in Java bytecode, while expressing the semantics of that in native code takes multiple instructions. So a native binary produced by a static Java compiler can be significantly bigger as the original jar file. Now, recall that most of the time during application startup is spent in loading code and data from the disk. So everything that we win by eliminating the warm-up cycle, we lose because we need to read more bytes from the disk. And when we did our AOT compiler for for the first time , it was a real surprise for us that startup time improvements are not so significant.

39

AOT IS Faster!Identify code that gets executed on startup

Place it at the beginning of the executable

Preload the entire startup segment inone sequential read (Makes most difference on rotating media, slow USB sticks, etc., not so much on SSDs)

40

In our implementation we've solved that problem, We do application startup profiling, which is easy to automate. Startup profiling detects the code that works at application start. The we place that code at the beginning of the executable and pre-load it by reading that "startup segment" sequentially. On rotating media, sequential reading is way faster than scattered reading. As a result, Java applications start really fast.

40

41

On the next slide you can see a chart comparing the startup times of three applications with similar functionality (RSS readers) The first two are native Windows apps written in Delphi and C++ respectively, and the third one is written in Java. The third group of bars visualizes the startup times of the Java app launched on a conventional JVM, and the fourth one - of the same application compiled statically. As you may see, a statically compiled Java application starts almost with the same speed as its native analogs, while on a conventional JVM the same Java application starts two to three times longer. So static compilation has clearly helped this app to start native-fast.

41

Performance

42

Next, let us consider application performance. Can AOT deliver better performance than JIT in the long run? There are two opposite myths regarding performance in the JIT vs. AOT battle.

42

Myth 4. AOT Is Faster

By compiling Java statically, we are making it equivalent to C, C is faster than Java, therefore statically compiled Java is faster

43

The first myth sounds as follows: When we compile Java statically, we get code similar to what C/C++ compilers produce, C is faster than Java, hence AOT compilation gives us better performance. Now, what is incorrect in this statement? Everything!Java, unlike C or C++, is a managed language. What that means in practice is that if you dereference a null pointer or violate an array bounds range, you get a NullPointException or an IndexOutOfBoundsException that you can catch. In C you get an immediate segfault without any opportunity for recovery. However, these safety features of Java do not come for free, regardless of whether static or dynamic compilation is used. In other words, Java has inherent overheads impacting performance that are absent in C/C++.But the statement that C is faster than Java is also not correct. If you write your programs in C, you use some libraries, which are distributed as the natively compiled code, and not all commonly used C compilers can use that native code for your application's optimization. On the other hand, all Java library code is available for both Java static and dynamic compiler optimizations! Even the Java platform core code can be inlined into your code during the optimizations, giving your app a significant performance boost.

43

Myth 5. JIT Is Faster

Effective optimization of a Java application isonly possible in the presenceof its dynamic execution profile

44

But there is also the opposite myth: Effective optimization of a Java application is possible only on the basis of its dynamic execution profile. That myth is the most difficult to bust because the best JVM engineers have been hammering it into Java developers' heads for years. But I will try to bust it anyway.If we are talking about performance, we are talking about code optimizations. It is not sufficient to just translate Java bytecode into native code. For the application to work fast, its code must pass through numerous optimizing transformations. 44

Compiler OptimizationsConstant propagationDead code eliminationCommon subexpression eliminationInline substitutionMethod specializationLoop unrollingLoop versioningLoop-invariant code motionTail recursion elimination

Call devirtualizationOn-stack allocation and explosion of objectsRuntime checks removalExcess synchronization removalOptimal code selection and bundling Instruction schedulingOptimal register allocationetc.

45

Here you can see only a few optimizations that can be applied to your Java code. Please don\t read this slide.Discussing all these optimizations now is far beyond the scope of this presentation. but I will tell you, very briefly, about the optimizations that are especially critical for Java.

45

Java == OOP

Lots of methods

Lots of small methods (get/set)

Lots of virtual calls of small methods46

It is common in Java to write applications in a good object-oriented style, using various design patterns, frameworks, best practices, and so on. As a result, from a place in the program where you need some functionality, to the place where that functionality is actually implemented, control is dispatched through a long chain of intermediate calls. So a good optimizing compiler should discover that call chain and inline it all into the calling method. However, in Java all calls are virtual by default.

46

Call DevirtualizationPrecondition for subsequent inline substitutionClass hierarchy analysisMethod not overridden => non-virtual callType analysisnew T().foo();// Non-virtual// call of T.foo()Inline caches

47

That means that call de-virtualization is a very important optimization for Java because it is for subsequent inline substitution.Method call de-virtualization can be based on three techniques: class hierarchy analysis, type inference analysis, and inline caches. Let's consider those techniques.

47

A a; a.foo(); if (RT_Type(a) in CHA) { inlined body of foo()} else { a.foo();}

Idea: method not overriden => non-virtual callClass Hierarchy Analysis (CHA)48

The main idea of Class Hierarchy Analysis, or CHA, is pretty simple. We traverse the class hierarchy, taking note of the methods that are not overridden, which means that we can call them directly. Dynamic compilers can even inline such methods into the their callers unconditionally. However, there is one problem with that technique: when a new class gets loaded at runz that does override such a method, that effectively invalidates all its inline substitutions, so the native code of all affected methods has to be thrown away for dynamic compilers: de-optimized, re-compiled, and so on. We cannot do that to statically compiled code; instead we insert a few instructions before the de-virtualized call, which check whether the run-time type of the receiver of foo() was subject to CHA. If the check passes, we execute the inlined code of the de-virtualized call of the method A.foo(). Otherwise, we simply call that method virtually. Please note that the check can be performed very effectively, with just one CPU instruction.But anyway CHA has overheads both for dynamic and static compilers.48

Type AnalysisIdea: new A().foo(); // non-virtual call. Always!

49

Fortunately, there is another technique that often allows us to de-virtualize method calls without any overheads. It is based on the fact that any method of a newly created object can be called directly, because the exact class of that object is statically known. That means that if we find out that the receiver object of method foo() was created with the operator new A, we know for sure the exact method that gets called - A.foo(). Is it clear that we can call foo method from new A() directly, always? regardless of whether any subclasses of A that override foo() exist or will be loaded.

49

Type AnalysisA a = b ? new B() : new C();a.foo();// is it virtual call? // B, C extend A and do not override A.foo

50

Let's move over.Type analysis attempts to calculate statically instances of which classes may be assigned to the variable a. If it appears that all possible sources of those instances are calls of the new operator on a known set of classes, and the method foo() is the same in all those classes, 50

Type AnalysisA a = b ? new B() : new C();a.foo();

If neither B nor override A.foo(),then a.foo()is a non-virtual call again! (can be inlined)51

we can de-virtualize the call and optionally inline it. But assigning the result of "new" to a variable is not a very frequent code pattern.

51

Type AnalysisA a = b ? bar() : baz();a.foo();

If bar() only returns results of new B, and baz() - only returns results of new , then a.foo() is again non-virtual (can be inlined)52

More often, a variable would get assigned a value returned by a method of some other class. How can we de-virtualize the subsequent calls in that case?Let us suppose that we know that the method bar only returns instances of the class B, and baz always returns instances of the class C. In this case, we can call the method foo() non-virtually.

52

Type Analysis

How do we know what bar() and baz() return?53

This poses the question: how do we know what bar() and baz() return? And here we can enjoy something that none of the dynamic compilers can dream of. A static compiler can use, within reason, as much time and memory for the optimization of your program as it needs.

53

Global AnalysisAnalyses all methods of the program, computing useful information about eachwhether a method always returns new T();whether an argument of a method does not escape to a shared memory

Results are then used for optimization of each particular method54

For example, it can analyze all methods of your program prior to optimization and calculate lots of useful information, such as instances of what classes get returned from concrete methods. And by starting the analysis from list methods that don't call anything and going up the call hierarchy, we can calculate that information efficiently. As a result, the knowledge of what bar() and baz() return often becomes available.The analysis that considers all methods of all classes of the program together is called global analysis. What else can we use it for? How about detecting method parameters that do not escape into shared memory?

54

Stack Allocation of ObjectsAll Java objects are supposed to reside in dynamic memory on the Java heapBut, most objects are small and temporaryIt is desirable to allocate them on the stack

Escape analysis determines whether a locally created object escapes from the method stackframe into shared memory

55

Escape analysis determines whether an object allocated inside a method escapes into shared memory. If it appears that it doesn't escape, that means that we can allocate it on the stack rather than on the heap. However, local escape analysis cannot detect such objects if they get passed as parameters to other methods. You probably know that HotSpot does stack allocation optimization but in order to do this it should inline all methods to which the subject for stack alloc is passed as parameter. But it is not always possible to inline all that methods. Hopefully global analysis helps us here determing that an argument of a particular method does not escape.Now I am going to show you how it all works together.

55

Example

for (Object o: getCollection()) { doSomething(o); } 56

Let's consider a pretty standard code pattern: iteration over a collection.

56

Example

Iterator iter = getCollection().iterator(); while (iter.hasNext()) { Object o = iter.next(); doSomething(o); } 57

First, the javac compiler would actually desugar this, producing bytecode equivalent to the following source: get an iterator from the collection and then do the iteration with the help of its next and hasNext methods. How can we optimize such code? Remember that next and hasNext" methods are abstract and we know nothing about them in the general case. But what if we have those extra bits of information from global analysis?

57

ExampleSuppose analysis has shown that getCollection() always returns an ArrayList

ArrayList list = getCollection(); Iterator iter = list.iterator(); while (iter.hasNext()) { Object o = iter.next(); doSomething(o); } 58

Lets imagine that the global analyzer has determined that getCollection() returns not an abstract collection, but an instance of the standard ArrayList class. Suddenly, things become much better, because we know exactly what the iterator() method of that class does.

58

Example

ArrayList list = getCollection(); ArrayList.ListItr iter = new ListItr(list); while (iter.hasNext()) { Object o = iter.next(); doSomething(o); } 59

Then, if we inline that method, we will see that the iterator becomes not an abstract but a concrete iterator, and we know everything about this iterator. In particular, we know that in its hasNext() and next() methods, the iterator object doesn't escape into shared memory, so we can allocate that object on the stack.

59

Example ArrayList list = getCollection(); ArrayList.ListItr iter = onStack ListItr(); iter.this$0 = list; iter.cursor = 0; iter.size = list.elemData.length; while (iter.hasNext()) { Object o = iter.next(); doSomething(o); } 60

Let's go further. If we inline the constructor of the iterator, we will see what fields it has and how they get initialized from the ArrayList object.If we now inline next and hasNext methods, 60

Example ArrayList list = getCollection(); ArrayList.ListItr iter = onStack ListItr(list); iter.this$0 = list; iter.cursor = 0; iter.size = list.elemData.length; while (iter.cursor < iter.size) { int index = iter.cursor++; Object o = iter.this$0.elemData[index]; doSomething(o); }

61

we will see that the iterator object as a whole is never referenced anymore, only its fields are used as variables. Therefore, we can convert all these fields into local method variables.

61

Example

ArrayList list = getCollection(); int cursor = 0; int size = list.elemData.length; while (cursor < size) { Object o = list.elemData[cursor++]; doSomething(o); } 62

And what do we see now? We see that this is a simple for loop with a counter

62

Example

ArrayList list = getCollection(); int size = list.elemData.length; for (int i = 0; i < size; i++) { doSomething(list.elemData[i]); } 63

its inductive variable was a field of an unknown object before optimization, and now it can be allocated on a register, the loop can be versioned, all runtime checks can be removed from the fast version correctly, the loop itself can be unrolled, we can try to inline doSomething, calculate all its loop invariants, and move them out of the loop. As a result, we'll obtain the most effective code for that quite common code pattern. And I would like to emphasize that we have performed all these tricks using only the results of static analysis, without executing that method any single time!

63

Analysis & Optimizationsare often quite complicated

require iterative re-computation

, if global, depend on the entire program

64

Concluding the part about optimizations, I would like to say that program analysis, optimizations, and the respective internal program representations can be quite sophisticated and require lots of computation resources. Also, as you have just seen, an optimization may create an opportunity for another optimization. So, it is not enough to perform analysis and optimizations once. Ideally, they should be repeated until reaching a fixed point. Another important thing is that a global analyzer takes as input the entire program at once, hence the volume of data it calculates depends linearly on the size of the program. For example, our Java AOT compiler stores the results of global analysis on a hard drive, because they may not fit into RAM.

64

Analysis & Optimizationsare often quite complicatedrequire iterative recomputation, if global, depend on the entire program

Can a JIT compiler affordall or any of that?

65

So, finally, the question is: can a JIT compiler afford all that?

65

Analysis & Optimizationsare often quite complicatedrequire iterative recomputations, if global, depend on the entire program


66

Analysis & Optimizationsare often quite complicatedrequire iterative recomputations, if global, depend on the entire program


67

It shares the CPU time and RAM with your application! And if the first two points are arguable, I can say for sure that global analysis does not suit the dynamic compilers. Really, can you imagine a dynamic compiler using gigabytes of disk space on the end-user system to perform optimizations at application run time?67

Dynamic Optimizations

Profiling and selective compilation

Inline substitution based on execution profile

Hot execution traces optimization

Optimal instruction selection

68

So what is a dynamic compiler good for? First, it can detect the frequently executed code with the help of profiling, and compile not all code of your program, but just the "hot" code, spending much less computational resources than a static compiler as a result. And this is actually the main advantage of dynamic compilers! As an experiment we performed several times, profile-guided optimization doesn't improve performance a lot, but it is enough to compile 10-100 times less code to achieve an acceptable level of application performance. It is this observation that makes the usage of dynamic compilers at all possible!Next, dynamic compilers can de-virtualize method calls based not on static code analysis but on its actual, dynamic execution profile: if a dynamic compiler sees that, at a particular call site, the target method is always the same, it can speculatively assume that it will be the same next time and in the future, and inline the method body. Extrapolating this technique, dynamic compilers can see hot execution traces in your applications, order them in into basic blocks, and optimize very aggressively. That works, and works really well!

68

Hot Code vs Warm CodeQ: What happens when an app with no distinctly hot code runs on a typical JVM?A: Long warmup with results not stored forfuture reuse

Typical case: UI-centric applications69

However, what if the application doesn't have evident hot code? This often holds, for example, for rich client applications based on Swing or JavaFX. In UI-centric applications, there is a lot of code for menus, windows, event handlers, and so on, but that code is not executed frequently enough to justify its JIT compilation. Thus in a classic JVM that code is simply interpreted! And that, of course, influences the users' perception of the application. For example, if we present to a user a copy of a Java UI application that runs on a classic JVM, and another one that has been statically compiled, the user may not understand the difference at first, but after some use they will notice that the second app responds to their input faster. Thus static compilation works better for UI-centric applications because it pre-optimizes every single methods of the application down to native code using the most aggressive optimizations as possible.

69

JFCMark (Short Run)70

The difference can be demonstrated on the following benchmark called JFCMark. We created it many years ago, to measure the UI responsiveness. This benchmark takes SwingSet2, the standard Swing demo from the JDK, and manipulates it with the help of a robot: opens and closes internal windows, modal dialogs, scrolls the rich text control, lays out controls and so on. JFCMark has two modes: a short run and a long run. The results of the short run are indicative of a graphical application performance as if you've just opened and started using it, whereas the long run measures response times after several hours of intensive user interaction. Here the performance of Excelsior JET, Hotspot Client, and Hotspot Server VMs is compared. As you see, during the short run the statically compiled benchmark runs twice as fast as on Hotspot Client.

70

JFCMark (Long Run)71Bigger better

During the long run, HotSpot becomes faster, but is still about one and a half times slower than statically compiled code.But you may ask, who writes client-side software in Java nowadays? All we have is server-side and on the server-side the dynamic approach works better, of course.

71

Profile Guided Optimizations

Can a static compiler usedynamic execution profiles as input?

72

Okay, lets look at it more closely. First, I would like to ask, can a static compiler use an execution profile for optimizations? The answer is yes, of course. In fact, all popular C/C++ compilers implement profile-guided optimizations. However, there are questions here: who will feed the compiler with the profile, how it can be collected? Usually, a static compiler toolchain has a special build mode in which the resulting binary gets instrumented for profile collection used for profile-guided optimizations in subsequent builds. But if you develop a UI-centric client side application, it may be not so convenient to collect a profile every time you need to re-build the binary. But, as you can remember, everything is more or less fine with static compilation of client side applications even without profile-guide optimizations.

72

Server sideCPU time in the cloud costs money

After some time, execution profile of a server app stabilizes

Why not pass it over to the AOT compiler?

73

If we now turn to the server side, we can notice that after having worked under high load generated by real users for some time, an application would likely have its execution profile stabilized: new classes do not get loaded, and the dynamic compiler sits idle. What if we now send the collected profile to another server where a static compiler is installed? Won't having relaxed memory and time constraints and an actual execution profile help it to optimize your server in the most effective way? Don't hurry to open your wallet though: such a solution doesn't exist yet. But that I think is exactly the scenario in which static compilation may be beneficial on the server side. 73

AOT on the Server sideStable performance and predictable latencyno code de-optimizations occur at run-timeWork at full speed right from the startgood for load balancingBetter startup timegood when many servers start simultaneouslyStill protects the code from decompilation

74

Moreover, we at Excelsior already have several satisfied users that develop server-side applications. Some of the benefit that performance of statically compiled servers is stable, there is no de-optimization, interpretation and recompilation of the code at unknown time, so the latency is much more predictable, some benefit from startup time that is 2-3 times better for AOT, that can be important if you launch a bunch of microservices simultanesously. And there is no warm-up, when the server is started it performs the same as other in the cluster so the load balancing does not suffer from this.

74

Embedded/IoTThe less powerful the hardware, the more expensive dynamic compilation becomes

Embedded systems typically have less computing power than desktops and servers75

Let's now take a brief look at the other end of the spectrum: embedded Java.

To an outsider, it may appear as if time has stopped in the embedded world. The processors are years behind those used in today's desktops and servers and the other computational resources, random access and flash memory, are often scarce. Now, the weaker the hardware is, the more expensive dynamic compilation becomes!

Resource constraints simply dont let the dynamic compilers work in full force. So for embedded Java, static compilation is just the ticket. Little known is the fact that Sun Microsystems had been including AOT compilers in its commercial Java ME offerings for many years, and Oracle continues to includes them.

75

MobileJVM on Mobile:Platform classesMemory Manager and GCReflectionJIT Compiler

What is missing?76

And, finally, somewhere between desktop and embedded worlds lies the beautiful world of wireless mobile devices.Slide. Let's think how JVM can look on a mobile platform? It needs platform classes, garbage collector, memory management, reflection (can work without reflection, but I would rather not), JIT. What is missing?

76


What is missing?77


What is missing?78


What is missing?79


What is missing?80


What is missing?

81


Charger!

82

The charger! 82

MobileWireless devices have batteriesDoes it make sense to spend power on dynamic compilation?83

All modern gadgets have a battery! Is dynamic compilation that will happen every time at your application start worth spending the battery?

83

iOSiOS policy prohibits creation of any native code during application run time

Getting Java apps to work on iOS requires either an interpreting JVM or an AOT compiler 84

It might be the reason why Steve Jobs prohibited dynamic compilation in iOS This leaves AOT compilation as the only viable alternative for Java on iOS.84

AOT Java Compilers in 2000Desktop/Server:BulletTrainExcelsior JETGNU Compiler for Java (GCJ)IBM VisualAge for JavaSupercede/JOVETowerJEmbedded/Mobile:Diab FastJEsmertec Jbed MEGCJ IBM J9Sun Java ME (custom offerings)

85

What static Java compilers exist? At the turn of the century, there were half a dozen of them, but only two have survived: GNU Compiler for Java, or GCJ, and Excelsior JET.GCJ is not a Java compatible solution. There is no guarantee that your Java application will be compiled with GCJ, and if compiled, that it will work. It's almost for sure that it won't. GCJ stopped at 1.4, 1.5 level and is not developing further. On the other hand, Excelsior JET is a fully Java SE compatible solution that has passed the official JCK test suite. Thus, if your application is written in accordance with the Java SE specification and works as expected on the reference implementation, it must also work when compiled with Excelsior JET. If it does not, the reason can be a bug in our product or an implementation-biased bug in your app. In either case, you should report the problem to us.

85

AOT Java Compilers in 2016Desktop/Server:Excelsior JETGCJIBM Java SDK for AIXIBM Java SDK for z/OSEmbedded:Excelsior JET EmbeddedIBM WebSphere Real TimeOracle Java ME Embedded ClientCodename OneAvianAndroid ARTRoboVM (RIP)Migeran (acquired by Intel)86

With the iOS appearance, we can observe a new Java static compilation development boom. Everyone wants to bring Java to iOS. There are several solutions. The first one is a commercial solution Codename One. Two former Sun engineers decided to take something like Java ME and make it work on mobile devices. They present their solution on Java conferences around the workd.Besides Codename One there is a couple of open source solutions.One of them is Avian, its developers state that they can statically compile Java for iOS.Until this spring, the most well-known solution was RoboVM. It was a JVM with AOT compiler based on Android class library. Then it was aquired by Xamarin, Xamarin itself was aquired by Microsoft that decided to kill RoboVM. However vefore aqusion the core VM of RoboVM was open source so now there exist several forks of RoboVM that are going to resurect it from ashes. Also youve probably heard about Migeran just another JVM with AOT compiler for iOS based on Android class library. It was acuired by Intel, now Intel does feel very well, and the future of the Migeran project seems to be dark.

All above mentioned solutions are not Java compatible. If you use them, you will need to work on your application to achieve the desirable behavior.We at Excelsior, also would like to release a solution for iOS. It will be real Java, I mean Java Compatible, I believe the solution will work well and won't lag. As I said earlier, there always was static compilation in Java ME. IBM also has a static compiler which they use for their Java SE Embedded solution because there are no resources to use JIT compilation at embedded world, while the code runs very slowly through interpreter, which may be not acceptable in embedded systems.

And Androd released Android ART several years ago wich is also Java AOT compilation solution. So Google finally decided that Java dynamic compilation is not so good for Android, it is interesting why?

86

87

This slide is taken from the JavaOne 2014 Strategy Keynote. Oracle announced that one of the new Java 9 features will be... AOT compilation! Oracle (Sun) always tried to convince everyone that static compilation of Java is not necessary, now it seems they have changed their mind.

87

Static compilation of Java apps:

88


89

Possible


90

Possible Preserving all Java features


91

Possible Preserving all Java features Useful in many aspects

Q & ANikita Lipsky,[email protected]: @pjBooms92

92Q: What if if we have bimorphic call siteQ: What if we have dynamic call site?A: First with gobal anlyses we can often detect that methods calls are not virtual at all. Is it clear that we can call methods directly from the objects that we detected staticall wsa create with new operators. So in this case we can even not insert any guards. If we do devertualization basing on CHA technique and detetcted that call-site is monomoprphic or bi-morhic we can inline it. If at compile time we see that a call site is not monomorpic but it appears that it is actually monomorpic at run-time than profile-guided optimizations can help here that can be applied by static compilers as well. Currently we do not do call-sites profiliing but we received several performace reports from our customers that proved that call-site profiling can help to their use-cases, so we think now to implement call site profiling to enable profile guided optimizations to be able to inline call sites that are monomorhic only at runtime. Actually, it is not very frequent when call-sites profiling helps to optimize code, if a call site is monomorphic usually we can detect it with type anlyses or CHA, but for some random cases when we cannot do it statically we can applu profile guided optimizations to statically compiled code as well.

Moreover HotSpot can only compile hotcode but with the help of global anylyses we can do optimizations for any your code whether it hot, warm or cold and that is advantage actually.

Q: Escape anlysys and scalar replacemnet in HotSpotHow is escape analyses is imlemented in HotSpot? First, HotSpot compiles only hot code. It can even compile only a part of the method that is executed, so the other part of the method can be in bytecode form, so HotSpot does not even have IR for that code. If it is so, it cannot aloocate any objects on the stack for such methods because it does not now if the object is escaped in the code that it does not have IR. An if the object is passed as parameter to other methods, HotSpot does not have any information about those methods, so it must assume that objects are escaped and do not do stack allocation. Now we have IR for all methods and we have result of global analyses. We know if the parameter is escape or not for every method and thus we can do much more stack allocation that HotSpot does. So HotSpot stack allocation is a ppor substitute in compare with out implemntation. And we released stack allocation in 2001

Q: Number crunching. Are there are any benefits?Usually for number crunching benchmarks HotSpot is able to do the same optimizations as we do. But if the logic is too complex, sometimes it can give up to optimize the code due to contsraint that it must have for performing optimizations as it shares CPI and Memory with your applications. Sometimes It have some weird heuristics for optimizations such as bytecode length for inline substitution. We have much more ralaxed constraints regarding optimizations we apply, because we can spend as much time and memory for optimizations as we need. Regarding inline substitution we have quite sophisticated algorithms for inline looking at the structure of a code of a method, looking thrugh a call chain that where are going to inline, whether the method is called in a loop etc. And you can tune inline a well.

But in general case, it is not possible to guess whether a particular Java code will work faster after static compilation or not. You have to measure the performance of your application somehow to see it. BTW, It was a JVM called JRockit that positioned as World Fastest JVM. It is a whopper lie actually. There is no such a thing as world fastest JVM: some JVMs optimise better some things other optimize better other things. And you have to try all of them to see if they do better job especially for your application.

Q: Tomcat redeploy. New versions of Tomcat.We can compile web applications with Tomcat into one optimized binary and after that you will not be able to redeploy your app dynamically without stopping the Tomcat server. However if may compile a Tomcat server alone and deploy your application the same way as you deploy them on a conventional JVM. This way redeploy will work the same way as on JVM but your application will be dynamicaly compiled working on a statically compiled Tomcat. In some cases you can get benefits even with such scenario because all JDK code and Tomcat server code is precompiled only your application will be handled dynamically.But I would not advise to use web application redeploy because it often causes memory leaks. As startup time of compiled Timcat is just 100 miliseconds, it would be better to just restart the Tomcat as a whole without any redeploy and in this case you can do your webapplications updating in precompiled and preoptimized form.

Q Can we cgange JVM logic?What if classloading logic chages. We do monitoring of such situation for clasloaders we support, and when it happens we releases updates of our product. For instace fourthcoming release will support Tomcat 8 classloaders and the latest Eclipse RCP classloaders. To solve the situation in general case, the class reference resolver component that I described today and respective runtime support are a pure Java code in our JVM. So theoretically we can publish an API for writing plugins to our JVM that will handle any such case of using classloaders in your application. But the demand is too low for this feature, so the API is not public now.

Q. How to compare with HotSpot clientHiotSpot client does more JIT compilation that HotSpot server does employing much weaker optimizations as a result. As it JIT compiles more than server the performance of UI applications is usually better of HotSpot client because UI applications have much of warm not hot code. It does not optimize the startup time, though as JIT compiation is faster there, the startup is ususually better than for HotSpot server. Anyway there is still warmup cycle for HotSpot client so its startup Iis still 2-3 times worse than statically compiled code. And as static compiler compiles all methods of your program applying the most aggressive optimizations as possible, UI application usially works better for statically compiled code. The dufference is like difference of UI responsiveness in iOS and Android actually.

Q. What about agents?If you use Java agents for AOP for instance, note that you can usually apply AOP weaving statically for your code and it would be better for performance even if you run your appliocation on HoSpot. And if you apply AOP statically then you can compile that statically patched bytecode with static native compiler as well. If you use agents for Jrebel for hot redeploy, it is unlikely that you will use that agent on a production environment, Our solution is not used during development. As we are Java compatible solution you may use HotSpot for development and when you decide to move your code to production you can compile it with Excelsior JET abd we gurantie that it will work the same way as on HotSpot but better. Actually we support Java agents but I they do bytecode transformations than first you should deploy that bytecode that you are going to transform and patched bytecode will be handled by our JIT compiler. However it is better to not use Java agents if possible to let statically compiled code to work. For the most of the real world casesit is possible to solve tasks without Java agents.

Q. What about monitoring?We support Java monitoring, namely Java mangemnt beans, so you can monitor statically compiled applications as well. But we do not support all monitor features that HotSpot supports because some of those features are HiotSpot biasd and optional as result in Java SE specification. For instance our GC is completely different with HotSpot one, some some monitoring features cant be applied to our GC at all. But some monitoring works and we are going to improve it in the future releases.As to profiling we do not suppport JVMTI. JVMTI is optional in Java SE specification. JVMTI is used by debuggers and as we protect bytecode from decompilation, it would be strange if you could debug statically compiled code with the helpof Java debuggers. That is the reason why we are not supporting JVMTI, though it is technically possibl;e to support it as well.

Q. What about CPU dynamic detectionYes, AOT compiler in general case should compile into native code for a minimal CPU requirements of your applications. And JIT can detect actual CPU.Actuallly you can compile for a modern hardware statically, but by default we compile into Pentium I class CPU instruction set by default for x86. However to our surprise performance improvemnets due to hardware dynamic detecting is not so significant. For inbstance we discovered that floating point operations is performed on floating stack almost the same speed as using SSE instructions the difference is just 15-20 persents. And non floating point code is actually CPU neutral regrading the performance fopr x86. Moreover we do CPU detecting at runtime and some performance critical runtime routines such as arraycopy has mutiple versions for different hardware, so yes arraycopy is fast in our implemenattion on moderrn hardware anyway because it uses a CPU-dependent variant.And for x64, there were always SSE instructions that we use of course for 64 native code.

Q HotSpot and AOT

If you look at the cuurent status of HotSpot AOT, they can only compile now only Java base module to native code for Linux x64. The resulting binary is huge about 100meabytes for base module. And they do not do any inline substitutions because they do not know how to do them statically. The code that we generate is 5 time less while applinge inline substitution and other optimizations. And they cannot compile your code, they do not support custom classloaders, no global analyses etc. They will never protect your code from decompilation because all your bytecode should exist to be re-optimised by JIT compiler else performance will be bad because they do not know how to optimize the Java code statically. Look we are working on our AOT compiler for 16 years! It was performed numerous R&D regarding this subject in our compiler to learn how to optimize Java statically. No one can just write their own AOT fast that will generate good optimized code. HotSpot has much investigations in dynamic compilation and it is not so easy to move over AOT compilation fast. It is very time consuming task not solving with just money.

Q Do you implement the whole Java RuntimeA Java runtime can devided in two parts: JVM and Java platform classes. We implemnt JVM by scratch while platfrom classes and natibe methods implemntation we are licensing from Oracle. And as we write our JVM in fully conformance with Java SE specification, your applications usually work when compiled. Because the real code that is executed as totatlly the same as for Orcale HotSpot and we execute it following Java SE specification in any aspect. It is often surprise for a newbie when he tries our solution. It just works, amazimgly!

Q Incorrect optimizationsA All optimizations are enabled by default in our impl. You can only tune inline substutution telling our comiler to inline more or less. Thus we had to debug all our optimizations to meet Java SE specification. There is no optimization that is applied in incorrect way according specification. We pass JCK test-suite in aggressivle optimized mode. And from my experience, there were not any single case from our support when we had to disable an optimization because it optimises your application in a wrong way. Yes, our JVM may have some bugs, and we fix them but we do not do any incorrect optimizations at all.

Q multithreaded optimizationsA Of course we should do all optimiuzations in conformance with Java memory model. That means that we should insert memory barieres in the native code where it is required by the specification. And of course we do all this and we can also optimize memory barries with thehelp of sttaic analyses and we do such optimizations correctly as well.

Q iOSA We will release Linux ARM this autumn and after that we will immediately start iOS development. Actually what we need to do for iOS is to write interpreter because no dynamic native code loading is permitted on iOS. And we do interpreter right now. We would also need to statically link all native methods implemntation with the resulting executable as no dynamuc libraries are permited in IOS as well. And we need to write native bindings for iOS to allow you to use iOS natve features in your Java app and finally there should be some tooling to allw you to devBug your apps directly in device. We belive that we can release our iOS solution in the autumn of 2017. And it will be Java compatible solution.As to UI stuff on iOS, I personally belive that you would be able to use JavaFX that was ported by Oracle to iOS but with native binding you will be able to use native controls as well. Actually we already have some competitors on iOS, for instance there is Xamarine that compiles C# for iOS and they are not banned. So I belive that we will not banned as well.

Q What about invoke dynamicA What is essentially is invokedynamic instruction. First time it executes bootstrap method that should return a Call site object and then and for subsequent calls, method hande is get from call site object and it is exceuted its invokeExact method. All this can be easiliy expressed in a native code. Probably you would like to ask how MethodHandles are implemnted in our JVM. We reuse Oracles method handles implemntation via lambda-forms that is pure Java code so that we can reuse it. All the bytecode that is generated by Oracle method handles implemntation is handled by our JIT compiler. Yes, we do not do call devirtualization for invokedynamic instruction that HotSpot can do in certain cases. So bytecode genrated say for Jruby code can work slower in our implmenation. But lambda-expressions that are implemnted via invokedynamic in Java 8 can actually statically optimized because we know actual lambda expression bodies at their actual use. So it is possible to inline lambdas statically in certain cases.

Software

Ahead-Of-Time Compilation of Java Applications