Upload
leon-nash
View
220
Download
2
Embed Size (px)
Citation preview
PPARALLEL ARALLEL PPROCESSING ROCESSING IINSTITUTE ·NSTITUTE · F FUDANUDAN UUNIVERSITYNIVERSITY
1
OutlineOutline
Motivation Design & ImplementationEvaluationFuture work
2
TThe popularity of Javahe popularity of Java
3
20.299%
Java!Java!Architecture neutralSimplified memory managementSecurity and Productivity……
4
Write Once Run Anywhere
How to further improve Java runtime performance?
Our ResearchOur ResearchLeverage the synergy between static and
dynamic optimizationsDynamic environment while leveraging
static benefitsFinding performance opportunities before
runtimeStatic annotation to help runtime
optimization
5
OpencjOpencjIt is our first milestone in the whole projectDevelop based on Open64Takes Java source files or Class files as
inputOutputs executable code for
Linux/IA32&x86-64Compilation process is similar to compiling
C/C++ applications
6
OutlineOutlineMotivationDesign & ImplementationEvaluationFuture work
7
Design Overview of Design Overview of OpencjOpencjMigrate frontend of gcj into Open64
8
Java exception handlingJava exception handlingSimilar to C++ exception, but has some
differences, such as runtime exceptions: a/0, NullPointerException No “catch-all” handler used in C++ “finally” mechanism, makes Java exception more
complex than C++ The key point of Java exception handling is to
record the relationship among try/catch/finally blocks.
9
Devirtualization Devirtualization Easy to reuse code for programmers but hard to
analyze for compilerResolve java virtual function call to promote
indirect call into direct callClass hierarchy analysis and Rapid type analysisDevirtualization is implemented at IPA phaseMany optimizations can benefit from this
transformation In SciMark 2.0 Java benchmark test, it can resolve
all 21 user defined virtual function calls.
10
Synchronization Synchronization eliminationeliminationBased on Escape Analysis
Flow-insensitive & interprocedural analysis
Connection Graph: captures the connectivity relationship among objects and object references.
Easily determine whether an object is local to a thread.
If a synchronized object is local to a thread, the synchronized operation can be removed
11
Building connect graphBuilding connect graphOnly five kinds of statements1. p = new P()
2. p = return_new_P()
3. p = q
4. p = q.f
5. p.f = q
12
Analysis processAnalysis process Intra-procedural analysis
Check every call graph node to find out whether there is a synchronized call in a PU
Set initial escape state of each reference node Inter-procedural analysis Start from main function and traverse the call
graph in depth-first order Pass escape states between caller and callee
13
Example 1Example 1
14
GlobalEscape
OutEscape
GlobalEscape
NoEscape
OutEscape
Example 1Example 1
15
GlobalEscape
NoEscape
GlobalEscape
NoEscape
Example2Example2
16
GlobalEscape
ArgEscape
ArgEscape
NoEscape
GlobalEscape
Example2Example2
17
NoEscape
GlobalEscape
GlobalEscape
NoEscape
Array bounds check eliminationArray bounds check eliminationArray bounds check to guarantee Java type-
safe executionPrevent many useful code optimizations
since array bounds check may raise exceptions
Fully elimination: if the check never failsPartial elimination: whenever possible,
moves bounds check out of loops
18
Example of ABCEExample of ABCE
19
Fully redundant check Fully redundant check eliminationeliminationExample
20
0<=i1<100
jc1
Fully redundant check Fully redundant check eliminationeliminationExample
21
Partial eliminationPartial eliminationAdopting loop
versioning technique to guarantee the exception semantic for Java
Set trigger conditions before and after the optimized loop
22
Example
Partial redundant check Partial redundant check eliminationelimination
23
Checks elimination of Checks elimination of ABCEABCE
24
Total: the total number checks in the test casePRCE: the number of Partial Redundant Check EliminationFRCE: the number of Fully Redundant Check EliminationABCE: FPCE+PRCE28.4% speedup in Scimark2 test, lower than we expected
OutlineOutline
MotivationDesign & ImplementationEvaluationFuture work
25
Performance gap between Performance gap between Java & CJava & C
26
opencj -O3 -IPA -fno-bounds-check opencc -O3 -IPA gcj -O3 -fno-bounds-check -funroll-loops gcc -O3 -funroll-loops
higher is better
Static compilation Static compilation vsvs JIT JIT
27
higher is better
Comparing two Java running modes. Running in JVM Running executable file directly
Static compilation Static compilation vsvs JIT JIT
28
lower is better
JDK 1.6 is best except mpegaudio More analysis work need to do.
OutlineOutline
MotivationDesign & ImplementationEvaluationFuture work
29
Future Trends – for JavaFuture Trends – for JavaWhere is Java headed with its dynamic
optimization framework: Exploring opportunities to achieve performance
parity with native code Online profiling mechanisms and feedback-
directed optimizations becoming mainstream …
30
Java advantagesJava advantagesSeveral studies show that Java could
potentially be faster than C/C++ for some reasons: C/C++ Pointers make optimization difficult It is easier to do memory management in Java
than C/C++ as Java only allocates memory through object instantiation. So Java garbage collectors can achieve better cache coherence
Dynamic compilation of Java can use additional information available at run-time to optimize code more effectively.
31
Future of OpencjFuture of OpencjOpencj will achieve better runtime performance by
using JVM as the execution environment Static annotation with annotation-aware JIT - Runtime IPA
Using just-in-time compiler - Apply more effective optimizations by profiling run-
time information
Using garbage collection - Better performance due to cache coherence
There are three steps in our schedule
32
Framework---step1Framework---step1
33
C/C++/F .java
IPL
IPA
BE (LNO, WOPT)
CG
x86 IA LWHIRL
.class
LIR ACTIONS
JIT Interp
runtimelibrary
WHIRL Reader
Whirl_to_LIR
HIR ACTIONS
Byte Code Reader
FE FE
IR Writer
Existing Module
New Module
C/C++
Framework—step2Framework—step2
34
C/C++/F .java
IPL
RIPA
BE (LNO, WOPT)
CG
x86 IA LWHIRL
.class
LIR ACTIONS
JIT Interp
runtimelibrary
WHIRL Reader
Whirl_to_LIR
HIR ACTIONS
Byte Code Reader
FE FE
IR Writer
Existing Module
New Module
C/C++
RIPA IR
Framework---finalFramework---final
35
C/C++/F .java
IPL
RIPA
BE (LNO, WOPT)
CG
x86 IA LWHIRL
.class
LIR ACTIONS
JIT Interp
runtimelibrary
WHIRL Reader
W to LIR
HIR ACTIONS
Byte Code Reader
FE FE
IR Writer
Existing Module
New Module
C/C++
RIPA IR
HWHIRL
Runtime OPT.
Feedback
DiscussionDiscussionShin is the leader of this projectQ&A
36
PPARALLEL ARALLEL PPROCESSING ROCESSING IINSTITUTE ·NSTITUTE · F FUDANUDAN UUNIVERSITYNIVERSITY
37