Upload
stephany-cross
View
259
Download
1
Embed Size (px)
Citation preview
2
Overview
■ Toolchain JOP, JIT vs. ahead-of-time compilation
■ Existing open source tools
■ JOPtimizer framework and code representations
■ Inlining
■ Results
3
Toolchain Overview
■ Sourcecode compiled with javac to Java bytecode
■ Optimization defered to JVM, profiling information and JIT compiler is used
■ Not feasable on embedded processors like JOP
4
Toolchain Overview
■ Ahead-of-timeoptimization needed
■ Optimization of bytecode for target platform
■ Output is Java bytecode
■ Profiling vs. static WCET
5
Toolchain Overview
■ Advantages over JIT Runtime is not critical
No warm-up phase to gather profiling information and to do JIT compiling
■ Disadvantages Less accurate/no profiling
information available at design-time, class hierarchy may change dynamically
Target platform must be known
6
Existing Tools
ProGuard Soot (-O) Soot (-w -W) Purdue Bloat Jopt, OthersBytecode (%) -26,2% -2,4% 6,3%Ext. RAM (%) -26,1% -1,2% 3,5%Sieve (%) 1,1% -2,7% -2,8%Kfl (%) 0,1% 0,0% 11,2%UdpIp (%) 2,1% 3,8% 15,7%Lift (%) 1,0% -5,3% 7,8%Exec Time (s) 3 62 61 165Pro None Large speedup Easy to use
Contra
Very fast, reduces memory
Nearly no speedup
Very slow, no speedup
Slow, more memory, not always optimal
Very slow, bytecode not readable by BCEL
Not available, commercial or only obfuscators
■ Soot framework looks promising, but not designed for embedded systems and very complex
■ Other open source tools usually only remove unused methods and obfuscate code
7
JOPtimizer
■ JOPtimizer: a new framework for optimizations
Intermediate code representations
Inlining which respects method size restrictions
8
Assumtions
■ Assumptions about embedded applications
No dynamic class loading or class modifications at runtime
Reflection is not used
All class files are available at compile-time (except native classes)
■ Allows more optimizations (but assumtions can be disables)
■ Exclude “library” code (like java.*)
Define: library classes must not extend/reference application classes
9
Java Class Files
■ A class consists of:
ConstantPool: indexed table of constants (numbers, Strings, class names, method names, signatures, .. )
Classname, super-class, interfaces (references to CP)
Fields, methods: name, signature, flags
Method code as attribute of methods
Stack architecture with variable length encoding
■ Parsing and compiling of classfiles done by existing Libraries (BCEL, ASM, ...)
10
The JVM Instruction Set
■ (partially) typed stack instructions
■ 32bit (int, float, reference, byte, short, ..) and 64bit (long, double) variables
■ exception-handling, synchronization, subroutines
■ Stack- and variable table entries always 32bit
■ No indirect jumps, stack size must be static
private Map m;
private void test(int i) { int j[] = new int[2]; float a = 2.0f;
j[0] = i * (int) a;
m.put(this, j);}
private test(I)V ICONST_2 NEWARRAY T_INT ASTORE 2 FCONST_2 FSTORE 3 ALOAD 2 ICONST_0 ILOAD 1 FLOAD 3 F2I
IMULIASTOREALOAD 0GETFIELD #4ALOAD 0ALOAD 2INVOKEINTERFACE #7 POPRETURN
11
Stackcode Representation
■ Internal representation (“stackcode”) Types and constant values as parameters of
instructions to reduce number of different instructions (~40 stackcode instructions)
Stack emulation to determine operand types for all instructions (swap, dup, ..)
Variables and types instead of 32-bit slots
Constant values instead of references into CP
split basic blocks at exception handler ranges too
■ Still a stack architecture
■ Stackcode can be mapped directly to bytecode (allows analysis of code size and execution time)
12
Quadcode Representation
■ Stack creates implicit dependencies between instructions and blocks, makes optimizations more complex
■ Quadruple form of code (“quadcode”)
Create local variable per stack slot, emulate stack to determine the arguments of instructions
Instructions with types and constants as parameter
Instructions to manupulate stack not needed (pop) or replaced with copy instructions (load, swap, dup, ..)
13
Quadcode Representation
■ Quadcode representation enables simpler implementation of optimizations, but code cannot be mapped to bytecode directly
■ Stackcode and Quadcode similar to Soot internal representations (Baf, Jimple, Shimple)
public int calc(int a, int b) { copy.ref s0, l0 // load.ref l0 // aload_0 getfield.'Test.fField' s0, s0 // getfield 'Test.fField' // getfield
#3 copy.float s1, 2.0f // push.float 2.0f // fconst_2 binop.float.div s0, s0, s1 // binop.float.div // fdiv copy.float l3, s0 // store.float l3 // fstore_3 copy.int s0, l1 // load.int l1 // iload_1 return.int s0 // return.int // ireturn }
14
Creation of Bytecode
■ Transformation back from quadcode to bytecode
Create complete expressions from instructions (“Decompile” code), compile expression trees to JVM instructions like javac (Soot does this (Grimp))
Create stack form of quadruple instructions, compile to bytecode (JOPtimizer does this, optional in Soot)
Per quadcode instruction: load parameters on stack, execute operation and store result back
■ load/store elimination and local variable allocation for stackcode needed before bytecode can be created
■ Decompilation method of Soot gets slightly better results
15
Inlining
■ Invocations are expensive on JOP
■ Inline methods to eliminate invokation overhead
■ Inlining is not always possible
Callee code restrictions
Code size and variable table size restrictions of JOP
■ Inlining comes at a price
Caller code size increases, makes caller cache miss more costly
Overall program size increases if callee is not removed (p.e. is called somewhere else)
16
Inlining methods
1. Traverse callgraph bottom-up (leaves first)
2. Find and devirtualize invocations
static, final, private invokations not virtual
Check class hierarchy for overloading methods
3. Check if inlining is admissible
4. Estimate gain
5. Replace invocation with copy of callee
insert nullpointer-check for callee class reference
map local variables of callee above caller variables
17
Inlining Checks
■ Inlining is not possible if
new code size or variable table size of caller exceeds platform limits
the callee uses exception handlers or synchronized code
throwing an exception clears the stack
stack of the caller needs to be saved and restored if an exception is handled within the inlined method (NYI in JOPtimizer)
the method or class is excluded from inlining by configuration (caller or callee, p.e. Native class)
18
Inlining Checks (cont.)
■ Check field- and method references in callee code Must be accessible from caller Else make field or method public if possible
Always possible for fields as they are not virtual in Java
All overloading methods must be made public too
If a private method is made public, all invocations have to be changed from invokespecial to invokevirtual (luckily only methods of callee class have to be searched)
Naming conflicts or dynamic class loading can prevent changes, thus preventing inlining
class A public a() tmp = new C() invoke tmp.b()
class B public b() if (v == null) invoke B.c()
private c()
class C extends B private c()
19
Inlining Checks (cont.)
■ Estimate gain of inlining
Depends on cache state
Possible degredation of performance if inlined method is seldom invoked
Calculate gain based on invocation frequency and cache state estimations
Decrease weight of callees with multiple call sites to reduce increase of application code size
Select method with highest (positive) weight for inlining
■ Add inlined invocations to inlining candidate list, repeat inlining (check with new codesize)
20
Benchmark Results
■ Inlining of stackcode, jbe @60Mhz
■ Inlining limited by maximal code size imposed by JOP's memory cache
■ Removing of unused code should be implemented
ProGuard Soot (-O) Soot (-w -W) JOPtimizerBytecode (%) 26,2% -2,4% 6,3% 12,4%Ext. RAM (%) 26,1% -1,2% 3,5% 6,9%Exec Time (s) 3 62 61 5,5Sieve (%) 1,1% -2,7% -2,8% 0,0%Kfl (%) 0,1% 0,0% 11,2% 13,3%UdpIp (%) 2,1% 3,8% 15,7% 8,9%Lift (%) 1,0% -5,3% 7,8% 14,0%
21
Inlining Improvements
■ Many improvements possible
Type analysis/callgraph thinning for better devirtualization
Better cache state and invocation frequency estimation(WCET-driven?)
Run optimizations to reduce code size prior to inlining
Allow inlining of synchronized code/exception handlers
Try to find invocations with highest gain application-wide
...
22
Summary
■ Optimizing code at runtime not feasible for (realtime) embedded systems
■ Existing open source tools not designed for embedded systems
■ Inlining implemented in JOPtimizer which takes target platform into account (code restrictions, caching, ..), up to 14% speedup of JBE benchmark
■ load/store elimination and local variable allocation needed for further optimizations to be implemented
■ Still many improvements possible ..