Java 8 Parallel Stream Internals (Part 3)schmidt/cs891f/2018-PDFs/09...2 •Understand parallel...

Preview:

Citation preview

Java 8 Parallel Stream Internals

(Part 3)

Douglas C. Schmidtd.schmidt@vanderbilt.edu

www.dre.vanderbilt.edu/~schmidt

Professor of Computer Science

Institute for Software

Integrated Systems

Vanderbilt University

Nashville, Tennessee, USA

2

• Understand parallel stream internals, e.g.

• Know what can change & what can’t

• Partition a data source into “chunks”

• Process chunks in parallel via thecommon fork-join pool

Learning Objectives in this Part of the Lesson

join join

join

Processsequentially

Processsequentially

Processsequentially

Processsequentially

InputString1.1 InputString1.2 InputString2.1 InputString2.2

InputString1 InputString2

trySplit()

InputString

trySplit() trySplit()

See www.ibm.com/developerworks/library/j-java-streams-3-brian-goetz

3

• Understand parallel stream internals, e.g.

• Know what can change & what can’t

• Partition a data source into “chunks”

• Process chunks in parallel via thecommon fork-join pool

• Recognize how the common fork-join pool is implemented

Learning Objectives in this Part of the Lesson

See gee.cs.oswego.edu/dl/papers/fj.pdf

4

Processing Chunks in Parallel via the Common

ForkJoinPool

5

• Chunks created by a spliterator are processed in the common fork-join pool

Fork-Join Pool

See gee.cs.oswego.edu/dl/papers/fj.pdf

Processing Chunks in Parallel via the Common ForkJoinPool

6

• A fork-join pool provides a high performance, fine-grained task execution framework for Java data parallelism

See docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinPool.html

Processing Chunks in Parallel via the Common ForkJoinPool

7

• A fork-join pool provides a high performance, fine-grained task execution framework for Java data parallelism

• It provides a parallel computing engine for many higher-level frameworks

See www.infoq.com/interviews/doug-lea-fork-join

filter(not(this::urlCached))

collect(toFuture())

map(this::downloadImageAsync)

flatMap(this::applyFiltersAsync)

collect(toList())

Parallel Streams

filter(not(this::urlCached))

map(this::downloadImage)

flatMap(this::applyFilters)

Completable Futures

ForkJoinPool

Processing Chunks in Parallel via the Common ForkJoinPool

8

• ForkJoinPool implements the Executor Service interface

See docs.oracle.com/javase/tutorial/essential/concurrency/executors.html

Processing Chunks in Parallel via the Common ForkJoinPool

9

• ForkJoinPool implements the Executor Service interface

• A ForkJoinPool executes ForkJoinTasks

See docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinTask.html

Processing Chunks in Parallel via the Common ForkJoinPool

10

• ForkJoinPool implements the Executor Service interface

• A ForkJoinPool executes ForkJoinTasks

• ForkJoinTask associates a chunk of data along with a computation on that datato enable fine-grained parallelism

See www.dre.vanderbilt.edu/~schmidt/PDF/DataParallelismInJava.pdf

Processing Chunks in Parallel via the Common ForkJoinPool

11

• A ForkJoinTask is lighter weight than a Java thread

Thread

ForkJoinTask

e.g., it omits its own run-time stack, registers, thread-local storage, etc.

Processing Chunks in Parallel via the Common ForkJoinPool

12

• A ForkJoinTask is lighter weight than a Java thread

• A large # of ForkJoinTasksthus run in a small # of Java threads in a ForkJoinPool

ForkJoinTasks

See www.infoq.com/interviews/doug-lea-fork-join

Processing Chunks in Parallel via the Common ForkJoinPool

13

Sub-Task1.1

• Parallel streams are a “user friendly” ForkJoinPool façade

See espressoprogrammer.com/fork-join-vs-parallel-stream-java-8

Sub-Task1.2

Sub-Task1.3

Sub-Task1.4

Sub-Task3.3

Sub-Task3.4

Deque Deque Deque

map(phrase -> searchForPhrase(…))

filter(not(SearchResults::isEmpty))

collect(toList())

45,000+ phrases

Search Phrases

Processing Chunks in Parallel via the Common ForkJoinPool

14

• Parallel streams are a “user friendly” ForkJoinPool façade

• You can program directly to the ForkJoinPool API, though it can be somewhat painful!

List<List<SearchResults>>

listOfListOfSearchResults =

ForkJoinPool.commonPool()

.invoke(new

SearchWithForkJoinTask

(inputList,

mPhrasesToFind, ...));

I gave you the

chance of

programming

Java 8 streams

But you have

elected the

way of pain!

Processing Chunks in Parallel via the Common ForkJoinPool

15

• Parallel streams are a “user friendly” ForkJoinPool façade

• You can program directly to the ForkJoinPool API, though it can be somewhat painful!

See SearchStreamGang/src/main/java/livelessons/streamgangs/SearchWithForkJoin.java

Use the common fork-join pool to search input strings

for phrases that match

45,000+ phrases

Search Phrases

Input Strings to Search

List<List<SearchResults>>

listOfListOfSearchResults =

ForkJoinPool.commonPool()

.invoke(new

SearchWithForkJoinTask

(inputList,

mPhrasesToFind, ...));

Processing Chunks in Parallel via the Common ForkJoinPool

16

• Parallel streams are a “user friendly” ForkJoinPool façade

• You can program directly to the ForkJoinPool API, though it can be somewhat painful!

• Best used for algorithms that don’t match Java 8’s parallel streams programming model

See www.oracle.com/technetwork/articles/java/fork-join-422606.html

Long compute() {

long count = 0L;

List<RecursiveTask<Long>> forks =

new LinkedList<>();

for (Folder sub : mFolder.getSubs()){

FolderSearchTask task = new

FolderSearchTask(sub, mWord);

forks.add(task); task.fork();

}

for (Doc doc : mFolder.getDocs()) {

DocSearchTask task =

new DocSearchTask(doc, mWord);

forks.add(task); task.fork();

}

for (RecursiveTask<Long> task : forks)

count = count + task.join();

return count;

}

Processing Chunks in Parallel via the Common ForkJoinPool

17

• All parallel streams in a process share the common fork-join pool

See dzone.com/articles/common-fork-join-pool-and-streams

Processing Chunks in Parallel via the Common ForkJoinPool

18

• All parallel streams in a process share the common fork-join pool

• Helps optimize resource utilizationby knowing what cores are being used globally within a process

See dzone.com/articles/common-fork-join-pool-and-streams

Processing Chunks in Parallel via the Common ForkJoinPool

19

• All parallel streams in a process share the common fork-join pool

• Helps optimize resource utilization by knowing what cores are being used globally within a process

• There are (intentionally) few “knobs” to control this (or any) fork-join pool

Processing Chunks in Parallel via the Common ForkJoinPool

See www.youtube.com/watch?v=sq0MX3fHkro

20

• All parallel streams in a process share the common fork-join pool

• Helps optimize resource utilization by knowing what cores are being used globally within a process

• There are (intentionally) few “knobs” to control this (or any) fork-join pool

• Contrast with the ThreadPoolExecutorframework

Processing Chunks in Parallel via the Common ForkJoinPool

21

• All parallel streams in a process share the common fork-join pool

• Helps optimize resource utilization by knowing what cores are being used globally within a process

• There are (intentionally) few “knobs” that control this fork-join pool

• You can configure the pool size

Processing Chunks in Parallel via the Common ForkJoinPool

See Part 4 of this lesson for details

System.setProperty

("java.util.concurrent"

+ ".ForkJoinPool.common"

+ ".parallelism",

10);

Desired # threads

22

Mapping Parallel Streams onto the Java Fork-Join Pool

23

• The Java 8 parallel streams framework automatically creates tasks that arerun by worker threads in the common fork-join pool

Mapping Parallel Streams onto the Java Fork-Join Pool

abstract class AbstractTask ... { ...

public void compute() {

Spliterator<P_IN> rs = spliterator, ls;

boolean forkRight = false; ...

while(... (ls = rs.trySplit()) != null){

K taskToFork;

if (forkRight)

{ forkRight = false; ... taskToFork = ...makeChild(rs); }

else

{ forkRight = true; ... taskToFork = ...makeChild(ls); }

taskToFork.fork();

}

} ...

See openjdk/8-b132/java/util/stream/AbstractTask.java

Abstract base class for most fork-join tasks used to implement stream ops

24

• The Java 8 parallel streams framework automatically creates tasks that arerun by worker threads in the common fork-join pool

Mapping Parallel Streams onto the Java Fork-Join Pool

abstract class AbstractTask ... { ...

public void compute() {

Spliterator<P_IN> rs = spliterator, ls;

boolean forkRight = false; ...

while(... (ls = rs.trySplit()) != null){

K taskToFork;

if (forkRight)

{ forkRight = false; ... taskToFork = ...makeChild(rs); }

else

{ forkRight = true; ... taskToFork = ...makeChild(ls); }

taskToFork.fork();

}

} ...

Decides whether to split a task further or compute it directly

25See docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html#trySplit

• The Java 8 parallel streams framework automatically creates tasks that arerun by worker threads in the common fork-join pool

Mapping Parallel Streams onto the Java Fork-Join Pool

abstract class AbstractTask ... { ...

public void compute() {

Spliterator<P_IN> rs = spliterator, ls;

boolean forkRight = false; ...

while(... (ls = rs.trySplit()) != null){

K taskToFork;

if (forkRight)

{ forkRight = false; ... taskToFork = ...makeChild(rs); }

else

{ forkRight = true; ... taskToFork = ...makeChild(ls); }

taskToFork.fork();

}

} ...

Try to partition the input source until trySplit() returns null

26

• The Java 8 parallel streams framework automatically creates tasks that arerun by worker threads in the common fork-join pool

Mapping Parallel Streams onto the Java Fork-Join Pool

abstract class AbstractTask ... { ...

public void compute() {

Spliterator<P_IN> rs = spliterator, ls;

boolean forkRight = false; ...

while(... (ls = rs.trySplit()) != null){

K taskToFork;

if (forkRight)

{ forkRight = false; ... taskToFork = ...makeChild(rs); }

else

{ forkRight = true; ... taskToFork = ...makeChild(ls); }

taskToFork.fork();

}

} ...

Alternative which child is forked to avoid biased spliterators

27

• The Java 8 parallel streams framework automatically creates tasks that arerun by worker threads in the common fork-join pool

Mapping Parallel Streams onto the Java Fork-Join Pool

abstract class AbstractTask ... { ...

public void compute() {

Spliterator<P_IN> rs = spliterator, ls;

boolean forkRight = false; ...

while(... (ls = rs.trySplit()) != null){

K taskToFork;

if (forkRight)

{ forkRight = false; ... taskToFork = ...makeChild(rs); }

else

{ forkRight = true; ... taskToFork = ...makeChild(ls); }

taskToFork.fork();

}

} ...

Fork off a new sub-task & continue processing the other in the loop

See docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinTask.html#fork

28

• The Java 8 parallel streams framework automatically creates tasks that arerun by worker threads in the common fork-join pool

Mapping Parallel Streams onto the Java Fork-Join Pool

abstract class AbstractTask ... { ...

public void compute() {

Spliterator<P_IN> rs = spliterator, ls;

boolean forkRight = false; ...

while(... (ls = rs.trySplit()) != null){

...

}

task.setLocalResult(task.doLeaf());

} ...

This method typically calls forEachRemaining() to process elements in the stream sequentially

See docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html#forEachRemaining

29

• Each worker thread in the common fork-join pool runs a loop scanningfor parallel stream tasks to run

Mapping Parallel Streams onto the Java Fork-Join Pool

30

• Each worker thread in the common fork-join pool runs a loop scanningfor parallel stream tasks to run

• Goal is to keep worker threads & cores as busy as possible!

Mapping Parallel Streams onto the Java Fork-Join Pool

31

• A worker thread has a “double-ended queue” (aka “deque”) that serves as its main source of tasks

Sub-Task1.1

Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

WorkQueue WorkQueue WorkQueue

See en.wikipedia.org/wiki/Double-ended_queue

Sub-Task1.4

Mapping Parallel Streams onto the Java Fork-Join Pool

32

• A worker thread has a “double-ended queue” (aka “deque”) that serves as its main source of tasks

• Implemented by WorkQueue

Sub-Task1.1

Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

WorkQueue WorkQueue WorkQueue

See java8/util/concurrent/ForkJoinPool.java

Sub-Task1.4

Mapping Parallel Streams onto the Java Fork-Join Pool

33

• When the AbstractTask.compute() method calls fork() on a task thistask is pushed onto the head of its worker thread’s deque

Sub-Task1.1

Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4Sub-Task2.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

See gee.cs.oswego.edu/dl/papers/fj.pdf

2.push()

1.fork()

Mapping Parallel Streams onto the Java Fork-Join Pool

34

• When the AbstractTask.compute() method calls fork() on a task thistask is pushed onto the head of its worker thread’s deque

• Each worker thread processes its deque in LIFO order

Sub-Task1.1

Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

Sub-Task2.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

2.pop()

1.join()

See en.wikipedia.org/wiki/Stack_(abstract_data_type)

Mapping Parallel Streams onto the Java Fork-Join Pool

35

• When the AbstractTask.compute() method calls fork() on a task thistask is pushed onto the head of its worker thread’s deque

• Each worker thread processes its deque in LIFO order

• A task pop’d from the head of a deque is run to completion

Sub-Task1.1

Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

Sub-Task2.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

2.pop()

1.join()

See en.wikipedia.org/wiki/Run_to_completion_scheduling

Mapping Parallel Streams onto the Java Fork-Join Pool

36

• When the AbstractTask.compute() method calls fork() on a task thistask is pushed onto the head of its worker thread’s deque

• Each worker thread processes its deque in LIFO order

• A task pop’d from the head of a deque is run to completion

• join() “pitches in” to pop& execute (sub-)tasks

Sub-Task1.1

Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

Sub-Task2.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

2.pop()

1.join()

Mapping Parallel Streams onto the Java Fork-Join Pool

37

• When the AbstractTask.compute() method calls fork() on a task thistask is pushed onto the head of its worker thread’s deque

• Each worker thread processes its deque in LIFO order

• A task pop’d from the head of a deque is run to completion

• join() “pitches in” to pop& execute (sub-)tasks

Sub-Task1.1

Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

Sub-Task2.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

2.pop()

1.join()

“Collaborative Jiffy Lube” model of processing!

Mapping Parallel Streams onto the Java Fork-Join Pool

38

• When the AbstractTask.compute() method calls fork() on a task thistask is pushed onto the head of its worker thread’s deque

• Each worker thread processes its deque in LIFO order

• LIFO order improves locality of reference & cache performance

Sub-Task1.1

Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

Sub-Task2.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

See en.wikipedia.org/wiki/Locality_of_reference

2.pop()

1...

Mapping Parallel Streams onto the Java Fork-Join Pool

39

• Worker threads only block if no parallel stream tasks are available for them to run

Mapping Parallel Streams onto the Java Fork-Join PoolWorkQueueWorkQueue WorkQueue

40See Doug Lea’s talk at www.youtube.com/watch?v=sq0MX3fHkro

Mapping Parallel Streams onto the Java Fork-Join Pool• Worker threads only block if no

parallel stream tasks are available for them to run

• Blocking worker threads & cores are costly on modern processors

WorkQueueWorkQueue WorkQueue

41

• Worker threads only block if no parallel stream tasks are available for them to run

• Blocking worker threads & cores are costly on modern processors

• Each worker thread thereforechecks other deques in thepool to find other tasks to run

Mapping Parallel Streams onto the Java Fork-Join Pool

Sub-Task1.1

Sub-Task1.2

Sub-Task1.3 Sub-Task3.3

Sub-Task3.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.4

42

• To maximize core utilization, idle worker threads “steal” work from the tail of busy threads’ deques

See docs.oracle.com/javase/tutorial/essential/concurrency/forkjoin.html

Sub-Task1.2

Sub-Task1.3

Sub-Task1.4

Sub-Task1.1

Sub-Task3.3

Sub-Task3.4

WorkQueue WorkQueue WorkQueue

poll()

Mapping Parallel Streams onto the Java Fork-Join Pool

43

• To maximize core utilization, idle worker threads “steal” work from the tail of busy threads’ deques

Sub-Task1.2

Sub-Task1.3

Sub-Task1.4

Sub-Task1.1

Sub-Task3.3

Sub-Task3.4

WorkQueue WorkQueue WorkQueue

poll()

A worker thread deque to steal from is selected randomly to lower contention

Mapping Parallel Streams onto the Java Fork-Join Pool

44

• Tasks are stolen in FIFO order

Sub-Task1.2

Sub-Task1.3

Sub-Task1.4

Sub-Task1.1

Sub-Task3.3

Sub-Task3.4

WorkQueue WorkQueue WorkQueue

See en.wikipedia.org/wiki/FIFO_(computing_and_electronics)

poll()

Mapping Parallel Streams onto the Java Fork-Join Pool

45

• Tasks are stolen in FIFO order, e.g.

• Minimizes contention withthread owning the deque

Sub-Task1.2

Sub-Task1.3

Sub-Task1.4

Sub-Task1.1

Sub-Task3.3

Sub-Task3.4

WorkQueue WorkQueue WorkQueue

See www.ibm.com/support/knowledgecenter/en/SS3KLZ/com.ibm.java.diagnostics.healthcenter.doc/topics/resolving.html

poll()

Mapping Parallel Streams onto the Java Fork-Join Pool

46

• Tasks are stolen in FIFO order, e.g.

• Minimizes contention withthread owning the deque

• An older stolen task may yielda larger unit of work due to theway in which spliterators work

Sub-Task1.2

Sub-Task1.3

Sub-Task1.4

Sub-Task1.1

Sub-Task3.3

Sub-Task3.4

WorkQueue WorkQueue WorkQueue

poll()

Mapping Parallel Streams onto the Java Fork-Join Pool

List<String>1.1 List<String>1.2

List<String>1 List<String>2

trySplit()

List<String>

trySplit()

List<String>2.1 List<String>2.2

trySplit()

47

• Tasks are stolen in FIFO order, e.g.

• Minimizes contention withthread owning the deque

• An older stolen task may yielda larger unit of work due to theway in which spliterators work

• Enables further recursive decompositions by the stealing thread

Sub-Task3.3

Sub-Task3.4

WorkQueue WorkQueue WorkQueue

Sub-Task1.1.1

Sub-Task1.1.2

Sub-Task1.1.3

Sub-Task1.1.4

Sub-Task1.2

Sub-Task1.3

Sub-Task1.4

Mapping Parallel Streams onto the Java Fork-Join Pool

48See www.dre.vanderbilt.edu/~schmidt/PDF/work-stealing-deque.pdf

• The WorkQueue deque that implements work-stealing minimizes locking contention

poll()

push()

pop()

Mapping Parallel Streams onto the Java Fork-Join Pool

49

• The WorkQueue deque that implements work-stealing minimizes locking contention

• push() & pop() are only called by the owning worker thread

poll()

push()

pop()

Mapping Parallel Streams onto the Java Fork-Join Pool

50

• The WorkQueue deque that implements work-stealing minimizes locking contention

• push() & pop() are only called by the owning worker thread

• These methods use wait-free “compare-and-swap” (CAS) operations

poll()

push()

pop()

See en.wikipedia.org/wiki/Compare-and-swap

Mapping Parallel Streams onto the Java Fork-Join Pool

51

• The WorkQueue deque that implements work-stealing minimizes locking contention

• push() & pop() are only called by the owning worker thread

• poll() may be called from another worker thread to “steal” a task

poll()

push()

pop()

Mapping Parallel Streams onto the Java Fork-Join Pool

52

• The WorkQueue deque that implements work-stealing minimizes locking contention

• push() & pop() are only called by the owning worker thread

• poll() may be called from another worker thread to “steal” a task

• May not always be wait-free

poll()

push()

pop()

See gee.cs.oswego.edu/dl/papers/fj.pdf

Mapping Parallel Streams onto the Java Fork-Join Pool

53

• The WorkQueue deque that implements work-stealing minimizes locking contention

• push() & pop() are only called by the owning worker thread

• poll() may be called from another worker thread to “steal” a task

• May not always be wait-free

• See “Implementation Overview” comments in the ForkJoinPoolsource code for details..

poll()

push()

pop()

See java8/util/concurrent/ForkJoinPool.java

Mapping Parallel Streams onto the Java Fork-Join Pool

54

End of Java 8 Parallel Stream Internals (Part 3)

Recommended