ABOUT ME
Jaroslav Bachorík Prague, 20-21 October 2016
Jaroslav Bachorík, [email protected], [email protected]@yardus
PERFORMANCE● Quantifiable
○ startup time○ request latency○ CPU usage○ Memory usage
● Reproducible○ controlled environment○ consistent results
● Measurable○ strictly defined target goals
● Benchmarking
Jaroslav Bachorík Prague, 20-21 October 2016
INSTRUMENTATIONint method() {
MyObject o = new MyObject();
int x = o.getCount();
logger.debug(“Instance “ + o “ has count “ + x;
//
return x;
}
Jaroslav Bachorík Prague, 20-21 October 2016
INSTRUMENTATION● APIs and code providing means to monitor and control application
○ loggers○ stat counters○ profilers
● Decoupled from the application○ application works properly without instrumentation○ same instrumentation may work for multiple applications
Jaroslav Bachorík Prague, 20-21 October 2016
SOURCE LEVEL INSTRUMENTATION● Instrumentation part of the source base
○ OS■ dtrace■ systemtap
○ Runtime■ JFR■ jstat counters
○ Application■ logging
● Difficult to modify and extend○ requires access to sources○ rebuild & redistribution
Jaroslav Bachorík Prague, 20-21 October 2016
BYTECODE LEVEL INSTRUMENTATION● No source code modifications● Modifying bytecode
○ result of Java source compilation○ binary executable consumed by JVM
● Bytecode Injection (BCI)○ during compilation
■ eg. maven AOP plugins■ same drawbacks as static
instrumentation○ during class loading
■ JVM agent and class transformers
Jaroslav Bachorík Prague, 20-21 October 2016
JVM JVM Agent
Classes
ClassloaderTransformer
Transformer
Transformer
CLASS TRANSFORMERS
Jaroslav Bachorík Prague, 20-21 October 2016
java.lang.instrument.ClassTransformer
byte[] transform(ClassLoader l, String name, Class<?> cls,
ProtectionDomain pd, byte[] classfileBuffer)
● Inspect and modify the class data○ complex task
■ constant pool■ stack frame map
○ better delegate to specialized tools■ ASM■ ByteBuddy■ CGLIB
DYNAMIC INSTRUMENTATION /w BCI● Required steps
○ Create and register JVM agent○ Create and register class transformers○ Prepare injected bytecode
■ create bytecode■ validate bytecode
○ Inject bytecode■ merge class bytecode /w injected bytecode■ validate merged bytecode■ redefine/retransform class using merged bytecode
Jaroslav Bachorík Prague, 20-21 October 2016
BTRACE● Bytecode level instrumentation simplified
○ JVM agent○ Class Transformers○ Optimized bytecode injection○ Safety guarantees
● Injected code as POJO○ annotations specify where injection should go○ code specifies what should be injected
● Started as a research project at Sun JDK Serviceability
Jaroslav Bachorík Prague, 20-21 October 2016
BTRACE SCRIPT● Easy access to
○ class and method name and parameters○ enclosing instance○ return value○ method duration○ fields via reflection
■ immutable, guarded, access only● Interfacing via
○ stdout○ file○ JMX (MXBean)○ jstat counters
Jaroslav Bachorík Prague, 20-21 October 2016
BTRACE SCRIPT@BTrace public class AllMethods {
@OnMethod(clazz="/javax\\.swing\\..*/", method="/.*/")
public static void m(@Self Object o, @ProbeClassName String probeClass,
@ProbeMethodName String probeMethod) {
println("this = " + o);
print("entered " + probeClass);
println("." + probeMethod);
}
}
Jaroslav Bachorík Prague, 20-21 October 2016
> this = DerivedColor(color=192,192,193)
> entered javax.swing.plaf.nimbus.DerivedColor.getRGB
PERFORMANCE IMPACT● Class (re)transformation
○ application startup time● Injected bytecode instructions
○ CPU usage○ JIT optimizer decisions○ heap usage○ GC activity
● Instrumentation framework○ additional drain of resources (CPU, RAM)
Jaroslav Bachorík Prague, 20-21 October 2016
SPARK SPECIFICS● Distributed environment● Worker JVMs come and go
○ startup time is important● The inner parts are frequently executed
○ RDD (Resilient Distributed Dataset) iterators○ latency/overhead of injected code is important
● Startup time equally important as latency/overhead
Jaroslav Bachorík Prague, 20-21 October 2016
CLASS (RE)TRANSFORMATION● Affects application startup time● Major impact on short lived applications ● Usually a small number of classes will be instrumented
○ optimize class filter for non-match● Minimize overhead of parsing class files
○ register as few transformers as possible○ consider smart caching of the transformed class data
● Example: Spark driver○ lifespan easily just a few minutes○ 104+ classes loaded at startup○ optimizing class transformation decreased overall overhead by >1.5%
Jaroslav Bachorík Prague, 20-21 October 2016
INJECTED BYTECODE● Affects the application runtime performance● Keep injected code as simple as possible
○ no non-deterministic loops○ minimize external method calls
■ escape analysis○ prefer working with stack instead of fields
■ method arguments■ local variables
● Smart activation of injected code○ sampling○ injection guards
Jaroslav Bachorík Prague, 20-21 October 2016
ESCAPING OBJECTS● A local instance escapes via injected instrumentation
○ affects GC and JIT optimizer decisions
Jaroslav Bachorík Prague, 20-21 October 2016
int method() {
MyObject o = new MyObject();
int x = o.getCount();
return x;
}
int method() {
MyObject o = new MyObject();
int x = o.getCount();
// inspect the instance providing the count
// a local instance 'o' escapes the method scope
Instrumentation.inspect(o);
//
return x;
}
GC INTERFERENCE● Minimize instrumentation interference with GC
○ use off-heap data structures where possible○ specialized primitive collections○ specialized queues in runtime (eg. JCTools)
● Reduce instantiations to minimum○ boxing○ string concatenation○ varargs
● Collect only raw data○ aggregations on different JVM or host
Jaroslav Bachorík Prague, 20-21 October 2016
STACK UNWINDING● Reuse the values stored on stack
Jaroslav Bachorík Prague, 20-21 October 2016
Java StackGETSTATIC TestClass.name : Ljava/lang/String;LLOAD 2
INVOKESPECIAL C.m (Ljava/lang/String;J)J String: “name”Long : 2 (H)Long : 2 (L)
DUP_X2
String: “name”Long : 2 (H)Long : 2 (L)
DUP2_X1INVOKESTATIC Probe.p(Ljava/lang/String;J)V
TIMESTAMP FOLDING● Timestamps are expensive
○ TSC correlated across cores○ monotonic counter values adjusted for core frequencies
● Minimize number of requested timestamps○ fold in subsequent calls to System.nanoTime()
● BTrace will optimize timestamps for @Duration parameters
Jaroslav Bachorík Prague, 20-21 October 2016
INVOCATION SAMPLING● Instrumented methods are frequently executed
○ injected code causing high overhead● Short methods experiencing unproportional overhead● Rely on statistically relevant sample instead
○ execute only on each Nth pass○ adjust N for acceptable overhead and detail
● Use @Sampled annotation in BTrace ○ fixed N○ dynamically adjusted N for guaranteed overhead
Jaroslav Bachorík Prague, 20-21 October 2016
SAMPLING IN BTRACE@BTrace
public class ArgsDurationSampled {
@OnMethod(clazz="/.*\\.OnMethodTest/", method="args", location=@Location(value=Kind.RETURN))
@Sampled(kind = Sampled.Sampler.Const, mean = 20)
public static void args(@Self Object self, @Return long retVal, @Duration long dur) {
println("args");
}
}
Jaroslav Bachorík Prague, 20-21 October 2016
// Adaptive sampler keeps ‘mean’ nanoseconds between samples in average
@Sampled(kind = Sampled.Sampler.Adaptive, mean = 300)
INJECTION GUARDS● Fastest code is the one never executed● Think of Logger levels● Class retransformation is costly● Introducing injection guards
○ injected code executed only when a condition is met○ minimal overhead when not executing injected code
■ fast field check● Use @Level annotation in BTrace
Jaroslav Bachorík Prague, 20-21 October 2016
@OnMethod(clazz="org.apache.spark.rdd.RDD",
method="iterator",
enableAt=@Level(">=" + SAMPLING_LEVEL),
location=@Location(Kind.RETURN))
HIGH PERFORMANCE INSTRUMENTATION● Fast filters for identifying injection points● Minimal and optimized code for injection
○ use timestamps sparsely○ beware of callbacks from injected code○ prefer stack manipulation above field retrievals
● Be gentle to GC● Use sampling when possible
○ getting overhead down○ still obtaining valid insights
● Enable turning off injection when not needed○ class retransformation is slow○ injection guards
Jaroslav Bachorík Prague, 20-21 October 2016
Resources● BTrace (https://github.com/btraceio/btrace)
○ Contributors welcomed!● ASM (http://asm.ow2.org/index.html)● CGLIB (https://github.com/cglib/cglib)● ByteBuddy (http://bytebuddy.net/#/)● JCTools (https://github.com/JCTools/JCTools)
Jaroslav Bachorík Prague, 20-21 October 2016