Concurrency in Java · 2016-02-09 · probablyruns forever. The reasonis that Java compileroptimizes thecode by readingvariable i from memory only once and the keeps it in register

Lecture notesConcurrent Programming Languages

Oleg Batrashev

February 10, 2016

Contents

I Concurrency in Java 1

1 Java threads 11.1 Creating threads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Measuring time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Inter-thread visibility 32.1 Example: infinite loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Hardware: invalid read/write order . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Example: read/write access ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.4 Hardware: cache subsystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.5 Java memory model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.6 Example: faulty publication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.7 Double-Checked Locking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.8 Object safe publication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.9 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

Part I

Concurrency in Java

1 Java threads

1.1 Creating threadsJava threads are nowadays 1-1 to OS threads. Here we show how to create Java threads.

1

Extend Thread class

1. Extend your class from the Thread class,

2. Create an object of your class,

3. Start your thread by using start() method on the object.

This may be the simplest method, but it is not very flexible.Using Runnable interface, as shown next is preferred way.

Implement Runnable interfaceCreate a class:

class Reader implements Runnable {@Overridepublic void run() {

// new thread starts execution here}

}

Create thread object and pass it new class object:

Thread t = new Thread(new Reader ());t.start ();

1.2 Measuring timeSystem.nanoTime()

• measures wall-time (elapsed time)

– as opposed to user time or CPU time

• valid only within the same JVM

• much better accuracy than currentTimeInMillis()

Example:

long mainStartTime = System.nanoTime ();while (i<N) i++;

t.join ();System.out.println("Time difference "

+( readerStartTime - mainStartTime ));

• join() waits for the other thread to finish, which makes its values available, e.g. readerStartTime

2

CountDownLatchCountDownLatch may be used to synchronize threads[1, sec 5.5.1]

latch = new CountDownLatch (1);Thread t = new Thread(new Reader ());t.start ();

latch.await ();

• await() method suspends until latch is zero

• countDown() method decreases the latch value

public void run() {latch.countDown ();try {

latch.await ();} catch (InterruptedException e) {

throw new RuntimeException(e);}readerStartTime = System.nanoTime ();

2 Inter-thread visibilityInter-thread visibility is about when changes in one thread are visible in another. [3, sec 3.1]

Visibility problemsSingle-threaded programs

• always see the last written value of a variablex=5...y=2*x

Multi-threaded programs

• can see variable values written from other threads,

• moreover, there is uncertainty in the order the changes are seen in the current thread.

In single-threaded programs when a value is written to a variable we can be sure we read thesame value next time we access the variable. This is ensured implicitly or explicitly by the semanticsof the programming language we are using.

The situation is very different with multi-threaded programs. A value written in one threadmay not be immediatelly visible in another thread. In the worst case, it will never be seen. This israre but real scenario, which is shown below.

There are several reasons why such situation can happen:

1. CPU registers

2. CPU instructions executing out-of-order

3. Compiler re-ordering optimizations

Before we proceed to the CPU internals lets look at one example: the effects of register usageby a program. See Frame 1 on how to run the examples.

3

Frame 1: Running Visiblity examples

Course page has jmm.zip file with Visibility examples. This may be compiled and run manuallyor you can use gradle – build tool for Java.

1. Download and unzip gradle

2. Download and unzip examples and go to the jmm directory

3. Run first example with bin/gradle -q run-Visibility1a

Gradle build file is in the same directory, you may change it the way you need.

2.1 Example: infinite loopExample codeWhen a thread finishes its execution upon specific value of a variable that is set from anotherthread, it may happen that it never observes the value. Consider the following example.

Variable i is visible to the Main and Reader threads:private static final int N = 50000000;private static /* volatile */ int i = 0;

Main thread writes the variable in the loop:Thread t = new Thread(new Reader ());t.start ();while (i<N) i++;System.out.println("Main finished");

Reader thread reads it in a loop and exits:int cnt =0;while (i<N) {

cnt ++;}System.out.println("Count "+cnt);

It is obvious that the Reader thread must finish its execution, however in practice it mostprobably runs forever. The reason is that Java compiler optimizes the code by reading variable ifrom memory only once and the keeps it in register for the following looping. Here, the examplehints that adding volatile keyword helps to solve the problem, because it tells the compiler toread/write variable directly from/to memory and not to use register for optimization.

2.2 Hardware: invalid read/write orderIn order to better understand visibility problems and solutions, we consider typical computer mem-ory hierarchy and read/write ordering.

Memory hierarchyThe hierarchy consists of the following layers: (number of cache layers may vary)

• Registers inside the core (CPU) – contain the values of instructions being executed;

4

Registers

Level 1 cache

< 1ns

Level 2 cache

1 − 2ns

several KBytes

Main memory (RAM)

5 − 15ns

30-100 KBytes

Core

100 − 300ns

several MBytes

several GBytes

(a) Memory hierarchy access times

CPU

Registers

Sto

rebuff

er

Load

buff

er

Cache

Memory

(b) CPU Load-/store buffers

Figure 1: CPU and memory

• Dedicated cache (Level 1) – each core has its own;

• Shared cache (e.g. Level 2) – common for all cores;

• Main memory (RAM) – just your memory.

Registers are just special cells of memory where CPU stores the values it is working on. Thereare usually several hundreds registers with the length of 4 to 16 bytes each. In total it is no morethan few kilobytes, which is small compared to several gigabytes of the main memory, but it is veryimportant to make programs run fast.

When a program runs on processor it does not take values from the main memory every time itneeds them, because it is expensive. Figure 1a shows the delays that is needed to access differentlayers of memory hierarchy: registers, cache, and RAM. Each core usually has its own higher-levelcache, e.g. Level 1 cache, and lower-level caches may shared between different cores. If a value isnot found in a higher level it must be retreived from the lower, slower level.

In the example above, Java compiler generates the machine code that reads the value of i frommemory only on the first iteration and stores it in a register. For the subsequent iterations it usesthe register to get the value of i. It is totally correct for single thread but it ignores the fact, thatthe variable may be changed from another thread. The programs would be much slower if thecompiler was not allowed to do such optimizations.

Memory read/write orderingWhen running machine code, CPU (core) may

• execute it out-of-order, thus write/read memory in different order.

The latter is mostly eliminated by load/store buffers:

1. all reads from cache are program ordered;

2. all write to cache are program ordered;

5

Machine Java versionLinux 4.3.0-1-amd64

Intel(R) Core(TM) i3 CPU M 330 @ 2.13GHzjava version "1.7.0_91”

OpenJDK Runtime Environment (IcedTea 2.6.3)(7u91-2.6.3-1)

OpenJDK 64-Bit Server VM (build 24.91-b01, mixedmode)

Linux 3.10.0-229.11.1.el7.x86_64Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz

openjdk version "1.8.0_51"OpenJDK Runtime Environment (build 1.8.0_51-b16)OpenJDK 64-Bit Server VM (build 25.51-b03, mixed

mode)

Table 1: Test machine configurations

3. reads may be re-ordered before writes to different memory location;

4. there are exceptions as described in [1] (vol. 3, sec. 8.2 “Memory ordering”);

5. memory fence instruction[6] must be used to disallow moving reads before writes.

It is complex and platform dependent! Programmer needs something simpler to grasp!

2.3 Example: read/write access orderingFor better feel for memory access ordering, lets investigate one simple example and test it.

Example code

private static volatile boolean isRunning = true;private static /* volatile only in b case*/ int x = 0;private static /* volatile only in c case*/ int y = 0;

Writer thread writes x and y in order to increasing values:

for (int i=0; i<N; i++) {x = i;y = i;

}isRunning = false;

Reader thread reads x and y in the opposite order:

int xl, yl, cnt=0;do {

yl = y;xl = x;if (xl < yl) cnt++;

} while (isRunning );

cnt tells how many times x < y.Run the tests using main classes Visibility2a-c. Just to be sure in which case the results are

reproducible, Table 1 lists the details of the machines used in the tests. There are 3 tests: a)without volatile for the variables, b) volatile is defined for x, c) volatile is defined for y.

6

Example analysis and resultsIf operations are not re-ordered:

• Writer thread always x first, so in memory x ≥ y

• Reader thread may get variables from different iterations of the Writer thread

– y is read first, thus it must be from earlier iteration, thus x > y

It must therefore never be that x < y. Test results are:x < y count (several runs) Machine1 Machine2

Visibility2a 984, 7487, 1179 2781, 37, 182Visibility2b 286, 21307, 1015 80975, 255, 80330Visibility2c 0, 0, 0 0, 0, 0

For tests a and b most probable explanation:

• compiler re-ordering optimizations caused different read and/or write order

Reasoning about program behavior of two threads is difficult! Unless there are certain guaran-tees, e.g. Java Memory Model.

Having several hundred or thousands cases of misbehavior out of 5 million may seem like aminor problem, especially considering that real programs does not tend to have such tight loopingover few variables. It turns out this is the source of large head-ache. Misbehavior appears rarely,once in a month, and is more probable on a heavy loaded system. This in the worst case may crashthe system or cause fatal data corruption. However, when testing and debugging the program, badbehavior does not appear, programs looks correct. It is even difficult to spot the potential case forsuch behavior, which may lead to the conclusion that the application user did something wrongor hardware is faulty. Such situation, where misbehavior depends heavily on the rarely occuringwrong ordering of events, is called race condition.

Ordering semantics of volatileIn Java program:

• writes to a volatile variable may not be re-ordered with earlier writes (possibly to differentvariables)

• reads from a volatile variable may not be re-ordered with the latter reads (possibly fromdifferent variables)

In our example:x = i;y = i;

• if y is volatile, it may not be ordered before write to x ;

• if x is volatile, no guarantees are given, i.e. write to y may be done before write to x.

It may be still difficult to think about operation re-orderings and their effects. The Java MemoryModel (JMM) utilizes different approach - happens-before relation, which we investigate next. Beforethat, there is one more aspect of hardware detail that deserves short description - cache subsystem.

7

2.4 Hardware: cache subsystemWe have avoided more detailed description of cache, but it is good to know what visibility guaranteesare given by it.

Common misconceptionsConsider typical cache sybsystem:

• several levels with 64-byte granularity (cache line);

• when not found in Level 1, it is copied from Level 2;

• when not found in cache, it is copied from Main memory;

• when done, the cache line is copied back from Level 1 to Level 2.

It is common to think that:

1. because Level 1 cache is dedicated to core, it is possible for two cores to have their own versionof a line;

2. to make all changed values visible to other threads it is required to flush core Level 1 cacheto at least Level 2 cache.

In fact, on modern systems both are false. [4]Having its own version of the same cache line means two cores may see different value in their

Level 1 cache for the same variable at the same time. This would add more complexity to theoverall consistency model. Flushing cache means it may be expensive for a core to “publish” thechanges it made locally, so that other cores can see them, especially if this implies flushing to themain memory.

Cache coherencyModern cache subsystems maintain coherence protocol, typically:

• only one core may hold a line in write mode, exclusively;

• many cores may share a line in read-only mode;

• when one requests a line in write mode others must invalidate the line.

With such protocols only one version of a line exists in the subsystem, either

1. shared (cloned) among cores in read-only mode

2. or exclusive to single core in writeable mode.

This is simplified description, actual cache coherency protocols, like MESIF or MOESI, are morecomplex.

One consequence of this protocol is that all writes are totally ordered with respect to eachother, i.e. there is no ambiguity about which write operation follow what other. There is also noambuiguity about which written value is seen by each read operation. Memory reads after the samewrite, however, are ordered only if they are done to the same core (from the same thread).

Another consequence is that a core always sees the latest value written to memory, it may neverread stale values from the cache, although stale values may still exist inside the core.

8

2.5 Java memory modelJava memory model [3, chap. 16][2, sec 2.2.7] continues the idea of ordering of reads and writeswith the happens-before relation.

Happens-before in single threadIn single thread actions execute as if by program order:

• statement A happens-before B if program code contains them in this order

• “as if” allows to re-order as long as the result is the same as with program (sequential) ordering

• in extreme, first assignment in x=5; x=y; may be discarded

– notice, this affects what values other threads may see

Program ordering does absolutely no guarantees:

• what are the relations between actions in different threads,

• i.e. what changes and when are seen in other threads.

It is unknown whether and when A effects will be visible in the second thread even if B is visible.A C

B D

Happens-before between threadsOn modern cache subsystems we know which particular write to a volatile preceeds a given read

from this volatile.JMM states that:

• write to a volatile variable happens-before subsequent read of that variable;

• happens-before relation is transitive: A→ B and B → C implies that A→ C;

• thus, any statements before write to volatile happen-before statements after the read from thevolatile;

– including reading/writing other variables

• again, compiler and CPU may optimize, but the results must be as if executed this way.

Java compiler and JVM must provide this behavior for our program. This frees from thinkingabout low-level visibility details of the hardware.

9

JMM for Visibility2c exampleRemember ordering example where we make variable y volatile.

• Assume y is read from the second thread just after writer thread writes value 5;

• because y is volatile there is happens-before relation between these write and read

x=5

yl=yy=5

xl=x

• consequently, x=5 happens-before xl=x, i.e. effects of the write must be visible to the Readerthread

• however, x may be 6 or 7, because no (happens-before) relations are defined for the subsequentwrites

– they may or may not be visible to the Reader thread

JMM visibility guaranteesJMM provides several other visibility guarantees:

• Write to a volatile happens-before subsequent reads of this volatile;

• Release of a lock happens-before subsequent take of this lock;

• Call to thread start method happens-before thread run method;

– i.e. everything assigned before thread.start() is visible in the thread!

• Actions on a thread happen-before the join in this thread

– if we wait for a thread to finish, then its effects are visible afterwards

• Writing final field in constructor happens-before the end of constructor

– implications of this are shown next.

There may be more, refer to JMM specification for details.

2.6 Example: faulty publicationIt may seem that visibility concerns are rarely relevant, i.e. programmers do not usually write inloop just to read from another thread. In this subsection it is made clear that such concerns arereal.

10

Example code

class Holder {int value;Holder(int v) { value = v; }

}private static /* volatile */ Holder h = new Holder (-1);

Writer thread code creates object and initializes to non-zero value:

for (i=0; i<N; i++) {Holder newH = new Holder ((i+1)*2);h = newH;

}

Reader thread code calculates how many time the value is zero:

int cnt =0;while (i < N-1) {

if (h.value == 0) cnt ++;}

Test results

• first case does not define field holder volatile, while second does

h.value = 0 count Machine1 Machine2Visibility3a 190463, 371652, 26030 15, 657, 323Visibility3b 0, 0, 0 0, 0, 0The reason for the results:

• Java may inline method and constructor calls;

• after that it may re-order inlined operations;

• it follows ordering guidelines guaranteed by JMM,

– without volatile reference h may be visible before its h.value

A thread may see incompletely created object from another thread, unless it uses volatile or someother form of synchronization!

2.7 Double-Checked LockingIt is assumed that synchronization is expensive and should be avoided. There exists very com-mon idiom that tries to skip synchronization during singleton initialization, called Double-CheckedLocking[5][3, sec 16.2.4][2, sec 2.4.1.2].

11

Double-Checked Locking code

• if object is not visible, synchronize and create it

• if object is already visible, use it

class Foo {private Helper helper;public Helper getHelper () {

if (helper == null) {synchronized(this) {

if (helper == null) {helper = new Helper ();

}}

}return helper;

}}

• As we have seen, in case of improper synchronization the object may be incomplete and causeserious faults!

• It may still be ok for integer or other primitive type fields (except long and double).

Quotes on Double-Checked LockingIt is generally unwise to use double-check for fields containing references to objects or arrays. Visibility

of a reference read without synchronization does not guarantee visibility of non-volatile fields accessible fromthe reference. Even if such reference is non-null, fields accessible via the reference without synchronizationmay obtain stale values. (see [2] p. 120)

The real problem with DCL is the assumption that the worst thing that can happen when reading a sharedobject reference without synchronization is to erroneously see a stale value (in this case, null); in that casethe DCL idiom compensates for this risk by trying again with the lock held. But the worst case is actuallyconsiderably worse—it is possible to see a current value of the reference but stale values for the object’sstate, meaning that the object could be seen to be in an invalid or incorrect state.

Subsequent changes in the JMM (Java 5.0 and later) have enabled DCL to work if resource is madevolatile, and the performance impact of this is small since volatile reads are usually only slightly moreexpensive than nonvolatile reads. (see [3] p. 348)

2.8 Object safe publicationInstead of thinking in terms of visibility of single variables it is easier to think about visibility ofthe whole obejcts.[3, sec 3.5, 16.2]

General rules of safe publicationWhen an object is created and should be shared to other threads:

• do not publish while constructor is running;

• if it contains only final fields it may be published without synchronization after the constructorhas finished;

• otherwise publish with synchronization after the constructor has finished:

12

– write reference to a volatile variable;

– using synchronized on the same object for writer and readers;

– using any other lock;

– write to a synchronized collection;

– use AtomicReference, ...

If you do not follow these rules, your program will most probably work correctly, until yourcustomers will start reporting about crashes and data corruption that are not reproducible!

2.9 ConclusionsConclusions

• Inter-thread visibility concerns when changes in one thread are seen to another

• It may be affected by CPU out-of-order execution, cache subsystem, and compiler re-orderingoptmizations

• It is hardware and compiler dependent

• Java provides Java Memory Model - common rules for all platforms

– it makes sure these rules are correctly matched to underlying hardware model

• Programmer only needs to apply JMM happens-before relation to his/her program to makesure it is correct

– simpler rules like Safe Publication may be used

• Visibility issues are most often ignored, because program seems to work fine

• This creates very unlikely race conditions, that shoot during production usage. They cannotbe reproduced or even traced.

References[1] Intel 64 and IA-32 Architectures Software Developer’s Manual. Feb. 2016. url: http://www.

intel . com / content / www / us / en / processors / architectures - software - developer -manuals.html.

[2] Doug Lea. Concurrent Programming in Java. Second Edition: Design Principles and Pat-terns. 2nd. Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 1999. isbn:0201310090.

[3] Tim Peierls et al. Java Concurrency in Practice. Addison-Wesley Professional, 2005. isbn:0321349601.

[4] Martin Thompson. CPU Cache Flushing Fallacy. Feb. 2013. url: http : / / mechanical -sympathy.blogspot.com.ee/2013/02/cpu-cache-flushing-fallacy.html.

13

http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html



http://mechanical-sympathy.blogspot.com.ee/2013/02/cpu-cache-flushing-fallacy.html

http://mechanical-sympathy.blogspot.com.ee/2013/02/cpu-cache-flushing-fallacy.html

[5] Wikipedia: Double-checked locking. Feb. 2016. url: https://en.wikipedia.org/wiki/Double-checked_locking.

[6] Wikipedia: Memory barrier. Feb. 2016. url: https://en.wikipedia.org/wiki/Memory_barrier.

14

https://en.wikipedia.org/wiki/Double-checked_locking

https://en.wikipedia.org/wiki/Double-checked_locking

https://en.wikipedia.org/wiki/Memory_barrier

https://en.wikipedia.org/wiki/Memory_barrier

IndexDDouble-Checked Locking, 11

Rrace condition, 7register, 5

Vvolatile, 4, 7

15

Documents

Concurrency in Java · 2016-02-09 · probablyruns forever. The reasonis that Java compileroptimizes thecode by readingvariable i from memory only once and the keeps it in register