37
Java performance tuning A project by Ali Gholami and Saeedeh Davoudi Dr.Nosratali Ashrafi Payaman University of Kharazmi +JVM, history, vs. c and c++

Java performance tuning

Embed Size (px)

Citation preview

Page 1: Java performance tuning

Java performance tuning

A project by Ali Gholami and Saeedeh Davoudi

Dr.Nosratali Ashrafi Payaman

University of Kharazmi

+JVM, history, vs. c and c++

Page 2: Java performance tuning

Overview

Page 3: Java performance tuning

Foundation of java

• Java language project started in June 1991 by James Gosling and Mike Sheridan and it

Was originally designed for interactive television.

• Java names

1. Oak 2. Green 3. Java

Page 4: Java performance tuning

Foundation of java

• Sun Microsystems released the first public implementation of java(java 1.0) in 1995.

• Java slogan :

Write once, Run Anywhere.

• Java 1.2 called J2SE in December 1998-1999 with multiple configurations for

different platforms. Including APIs for applications typically run in server

environments and mobile applications.

• in 2006 java renamed new J2 versions as Java EE, Java ME, and Java SE for marketing

purposes.

Page 5: Java performance tuning

Foundation of java

Five primary goals of Java foundation:

simple, object-oriented, and

familiar

robust and secure

architecture-neutral and

portable

high performance

interpreted, threaded, and

dynamic

Page 6: Java performance tuning

Java Virtual Machine

• A Java virtual machine (JVM) is an abstract computing machine that enables a

computer to run a Java program.

What is Java Virtual Machine and How applications run on that?

• An instance of a JVM is an implementation running in a process that executes a

computer program compiled into Java bytecode.

• The state of JVM in the compilation process is given below:

Page 7: Java performance tuning

Java Virtual Machine Class Loader

Structure of the Java Virtual Machine is given below:

Class Loader Subsystem is a part of the Java

Runtime Environment that dynamically loads Java

classes into the Java Virtual Machine.

• Usually classes are only loaded on demand. The

Java run time system does not need to know

about files and file systems because of class

loaders.

• In the Java language, libraries(consists multiple

classes) are typically packaged in JAR files.

• In general speech, the class loader will LOAD

the libraries and classes needed in the JVM.

Execution Engine

Page 8: Java performance tuning

Java Virtual Machine Class Loader

When the JVM is started, three class loaders are used:

• Bootstrap class loader(Loads the core Java libraries located in the

<JAVA_HOME>/jre/lib directory.)

• Extensions class loader( Loads the code in the extensions directories

(<JAVA_HOME>/jre/lib/ext, or any other directory specified by the java.ext.dirs system

property).

It is implemented by the sun.misc.Launcher$ExtClassLoader class.)

• System class loader(Loads code found on java.class.path, which maps to the CLASSPATH

environment variable. This is implemented by the sun.misc.Launcher$AppClassLoader

class.)

Page 9: Java performance tuning

Java Virtual Machine Method Area

• information about loaded types is stored in a logical area of memory called the

method area.

• The data in the Method Area stay in memory as

long as the classloader which loaded them is

alive.

• The method area stores:

class information (number of fields/methods,

super class name, interfaces names, version).

the bytecode of methods and constructors.

a runtime constant pool per class loaded.

• Constant pool is a part of .class file (and its in-memory representation) that contains

constants needed to run the code of that class.

Page 10: Java performance tuning

Java Virtual Machine Heap

• The heap is a memory area shared among all Java Virtual Machine Threads. It is

created on virtual machine start-up. All class instances and arrays are allocated in

the heap (with the new operator).

• This zone must be managed by a garbage collector to

remove the instances allocated by the developer when they

are not used anymore.

• The heap can be dynamically expanded or contracted and

can have a fixed minimum and maximum size.

• Note: There is a maximum size that the heap can’t exceed. If

this limit is exceeded the JVM throws an OutOfMemoryError.

Page 11: Java performance tuning

Java Virtual Machine Stack

• Java Stack memory is used for execution of a thread. They contain method specific

values that are short-lived and references to other objects in the heap that are

getting referred from the method.

• Here is the example of JVM stack before

and after a function call.

• Each function call or passing

the arguments to constructor

and referencing on the heap

will create new stack frame.

Page 12: Java performance tuning

Stack and Heap Example

Stack Memory

main()

memoryobjectint i=1

func()

string arg

Memory

Object

String Pool

Heap Memory

Java Runtime Memory

Page 13: Java performance tuning

Java Virtual Machine PC Register / Native Stack

• PC(Program Counter) Register contains the address of the instruction currently

being executed in its associated thread. The PC Register is very small data area and

has a fixed size. Java applications do not have any impact on its content and size.

• Native Method Stack stores similar data elements as a JVM

Stack and it is used to help executing native (non-Java)

methods. To play with a Native Method Stack, we need to

integrate some native program codes into Java

applications.

Page 14: Java performance tuning

Java Virtual Machine Execution Engine

• At the core of any Java virtual machine implementation is its execution engine.

• the behavior of the execution engine is defined in terms of an instruction set. For

each instruction, the specification describes in detail what an implementation should

do when it encounters the instruction as it executes bytecodes.

• Each thread of a running Java application is a distinct instance of the virtual

machine's execution engine. From the beginning of its lifetime to the end, a thread is

either executing bytecodes or native methods. A thread may execute bytecodes

directly, by interpreting or executing natively in silicon, or indirectly, by just- in-time

compiling and executing the resulting native code.

Page 15: Java performance tuning

Java Virtual Machine JNI

• The Java Native Interface (JNI) is a native programming interface that is part of the

Java Software Development Kit (SDK).

• JNI is a programming framework that enables

Java code running in a Java Virtual Machine

(JVM) to call and be called by native

applications (programs specific to a hardware

and operating system platform) and libraries

written in other languages such as C, C++ and

assembly.

Page 16: Java performance tuning

Introduction to Java performance

• Java was always slower than C and C++ because of the different language structure:

After compiling java app will run on JVM Instead of computer processor.

• Performance of java increased since 1997 with following events:

1. Introducing the JIT(Just In Time Compilation).

2. optimizations in the JVM (such as HotSpot becoming the default for Sun's JVM in

2000).

3. Hardware execution of Java bytecode,such as that offered by ARM's Jazelle.

Page 17: Java performance tuning

Introduction to Java performance

• The performance of a Java bytecode (compiled Java program) depends on:

MANAGEMENTOF TASKS GIVEN TO JVM.

EXPLOITATION OF HARDWARE AND FEATURES OF THE

COMPUTER BY THE JVM.

Tuning of java performance = JVM Optimization

Page 18: Java performance tuning

Virtual Machine Optimization methods

Split-time Bytecode verification

I/O tuningMemory Tuning

Thread contention tuning

CPU usage tuning

Register allocation improvement

Class data sharing

Page 19: Java performance tuning

Java Memory Tuning

Java Memory performance tuning

areas

Memory footprint Allocation rate Garbage collection

Page 20: Java performance tuning

Java Memory Tuning memory footprint tuning

Reasons of getting a OutOfMemoryError…

1. Too much data

2. Fat data representation

Page 21: Java performance tuning

Java Memory Tuning memory footprint tuning too much data

• Verbose logging is intended as the first tool to be used when attempting to diagnose

garbage collector problems.

Using Verbose GC we have to observe numbers in “Full GC” Logs.

[Full GC $before->$after($total), $time secs]

• If you don’t need all that data you can use: LRU cache or soft references.

One simple but effective algorithm is the Least Recently Used, or LRU, algorithm.

When performing LRU caching, you always throw out the data that was least recently

used. As an example, let's imagine a cache that can hold up to five pieces of data.

the garbage collector will always collect weakly referenced objects, but will only

collect softly referenced objects when its algorithms decide that memory is low

enough to warrant it.

Page 22: Java performance tuning

Java Memory Tuning memory footprint tuning fat data problem

• This conditions occurs when massive data loading is in process.

• Like encoding a 100 GB text file with Huffman algorithm (when the file is loaded into

a string).

• Compressed object pointers can be used.

Uncompressed(bytes) Compressed(bytes)

Pointer 8 4

Object header 16 12

Array header 24 16

Page 23: Java performance tuning

Java Memory Tuning Allocation rate tuning

• Allocation rate is measured in the amount of memory allocated per time unit. Often it

is expressed in MB/sec.

High allocation rate = performance issues

Mostly Occurs when

Garbage Collection becomes a bottleneck.

Garbage collection is covered in this section further.

Page 24: Java performance tuning

Java application Cpu usage tuning

• Real Problem seen on stackoverflow.com :

There are two Java processes (A, B) on a Linux machine (CentOS 6.5 64bit). A sends lots

of binary data to B using sockets. B writes data to disk. Per second 50-100MB data are

written to disk. On a quad core processor, the CPU is nearly 100% used. Previously we

ran a similar application but written by C, only 25% of CPU was used.

• We can use below methods to reduce application cpu usage:

restricting the use of JVM memory in JDK settings.

refactoring of the application code.

reducing memory allocation.(reuse of objects and …)

Page 25: Java performance tuning

Java application Cpu usage tuning

Page 26: Java performance tuning

Thread contention tuning

• thread contention is a condition where one thread is waiting for a lock/object that is

currently being held by another thread. Therefore, this waiting thread cannot use that

object until the other thread has unlocked that particular object.

• Ways of reducing Thread contention:

No expensive calculations in locks.

Employ interlocked/atomic operations.

Use synchronized data structures.

Use Read-Only data whenever possible.

Avoid Object Pooling.

Page 27: Java performance tuning

Java I/O tuning

• Java IO is an API that comes with Java which is targeted at reading and writing data

(input and output). Most applications need to process some input and produce some

output based on that input.

• One thing that affects Java IO performance is the use of character-by-character IO --

calling the InputStream.read() or the Reader.read() methods to read one character

which don’t use the BUFFERING.

• It is recommended to use the standard BufferedReader and BufferedInputStream

classes or use the block-read methods to read larger blocks of data at a time.

Page 28: Java performance tuning

split-time bytecode verification

What is Bytecode verification?

• When a class loader presents the bytecodes of a newly loaded Java platform class to

the virtual machine, these bytecodes are first inspected by a verifier. The verifier

checks that the instructions cannot perform actions that are obviously damaging.

• A method named split-time verification, first introduced in the Java Platform, Micro

Edition (J2ME), is used in the JVM since Java version 6. It splits the verification of Java

bytecode in two phases:

• Design-time – when compiling a class from source to bytecode

• Runtime – when loading a class.

Page 29: Java performance tuning

Garbage Collection tuning

Garbage collection = biggest threat to JVM responsiveness

What is Memory pool and GC?

• Memory pools, also called fixed-size blocks allocation, is the use of pools for memory

management that allows dynamic memory allocation comparable to malloc or C++'s

operator new.

• Objects in memory have an important property of temporal persistence.

• To exploit this principle, we can build what is known as a generational garbage

collector. Objects will initially be allocated to a chunk of memory called the first

generation(Eden), or G1. When G1 becomes full, we copy the live objects into

another block of memory called the second generation, or G2, and free up the entire

G1.

What is Generational GC?

Page 30: Java performance tuning

Garbage Collection tuning

• all a copying-collector does is start from a set of roots (in our case, the operand

stack), and traverse all of the reachable memory-allocated objects, copying them

from one half of memory into the other half. The area of memory that we copy from is

called old space and the area of memory that we copy to is called new space.

• Eden: All new allocation happens in eden.

• Survivor: when eden fills up, stop-the-world(kind of GC) copy-collection into survivor

space.

• After several collection, survivors get tenured into old generation.

Page 31: Java performance tuning

History of performance tuningJava version Performance improvement(s)

JDK 1.1.6 First just-in-time compilation(Symantec's JIT-compiler).

J2SE 1.2 Use of a generational collector.

J2SE 1.3 Just-in-time compiling by HotSpot.

Java SE 5.0 Class data sharing.

Java SE 6 Split bytecode verification.Escape analysis and lock coarsening.Register allocation improvements.

Java 7 JVM support for dynamic programming languages.Enhance the existing concurrency library by managing parallel computing on multi-core processors.Allow the JVM to use both the client and server JIT compilers in the same session with a method called tiered compiling.

Page 32: Java performance tuning

Performance comparison

Implementation of an app with c++ and java languages.C++ produces faster results.

Page 33: Java performance tuning

Performance comparison

What makes C/C++ faster than java?

C++ java

In touch with processor directly.Write once, compile anywhere (WOCA).

Java apps run on JVM.Write once, run anywhere/everywhere (WORA/WORE).

Fast and direct pointer access in C++. Slower pointer access and differentstructure.

C++ "on the stack" objects will cost nothing at allocation and destruction, and will need no GC to work in an independent thread to do the cleaning.

The use of GC and methods called for each object to be cleaned, waste the time!

No strict relationship between class names and filenames.A header file and implementation file are used for each class.

Strict relationship is enforced,Example: source code for class test has to be in test.java and such overheads which makes it slower in small apps.

Page 34: Java performance tuning

Performance comparison

Benchmark comparison between Java and C++

Page 35: Java performance tuning

Performance comparison

Multi-Core performance comparison between Java and C++

• the scalability and performance of Java applications on multi-core systems is limited

by the object allocation rate.This effect is sometimes called an "allocation wall".

• modern garbage collector algorithms use multiple cores to perform garbage

collection, which to some degree alleviates this problem.

Page 36: Java performance tuning

Memory use performance comparison between Java and C++

Performance comparison

• Java memory use is much higher than C++ because:

There is an overhead of 8 bytes for each object and 12 bytes for each array in Java.

Parts of the Java Class Library must load before program execution.

The virtual machine uses substantial memory.

Page 37: Java performance tuning

Ali Gholami 932171021Saeedeh Davoudi 932171010