Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Overview of Java 8 Parallel Streams
(Part 2)
Douglas C. [email protected]
www.dre.vanderbilt.edu/~schmidt
Professor of Computer Science
Institute for Software
Integrated Systems
Vanderbilt University
Nashville, Tennessee, USA
2
Learning Objectives in this Part of the Lesson• Know how aggregate operations & functional programming features are
applied in the parallel streams framework
• Be aware of how a parallel stream works “under the hood”
join joinjoin
Processsequentially
Processsequentially
Processsequentially
Processsequentially
DataSource1.1 DataSource1.2 DataSource2.1 DataSource2.2
DataSource1 DataSource2
DataSource
trySplit()
trySplit() trySplit()
See www.ibm.com/developerworks/library/j-java-streams-3-brian-goetz
3
• Know how aggregate operations & functional programming features are applied in the parallel streams framework
• Be aware of how a parallel stream works “under the hood”
• Recognize now to avoid concurrencyhazards in parallel streams
Learning Objectives in this Part of the Lesson
Aggregate operation (Function f)
Input x
Output f(x)
Output g(f(x))
Output h(g(f(x)))
Aggregate operation (Function g)
Aggregate operation (Function h)
4
Overview of How a Parallel Stream Works
5
• A Java 8 parallel stream implementsa “map/reduce” variant optimized for multi-core processors
See en.wikipedia.org/wiki/MapReduce
Overview of How a Parallel Stream Works
Reduce
Map
Partition
6
• A Java 8 parallel stream implementsa “map/reduce” variant optimized for multi-core processors
• It’s actually a three phase“split-apply-combine” data processing strategy
See www.jstatsoft.org/article/view/v040i01
join joinjoin
Processsequentially
Processsequentially
Processsequentially
Processsequentially
DataSource1.1 DataSource1.2 DataSource2.1 DataSource2.2
DataSource1 DataSource2
DataSource
trySplit()
trySplit() trySplit()
Overview of How a Parallel Stream Works
7
• The split-apply-combine phases are:
1. Split – Recursively partition a data source into independent“chunks”
CollectionData1.1 CollectionData1.2 CollectionData2.1 CollectionData2.2
CollectionData1 CollectionData2
CollectionData
See en.wikipedia.org/wiki/Divide_and_conquer_algorithm
trySplit()
trySplit() trySplit()
Overview of How a Parallel Stream Works
8
• The split-apply-combine phases are:
1. Split – Recursively partition a data source into independent“chunks”
• Spliterators are definedto partition collections in Java 8
CollectionData1.1 CollectionData1.2 CollectionData2.1 CollectionData2.2
CollectionData1 CollectionData2
CollectionData
See docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html
trySplit()
trySplit() trySplit()
Overview of How a Parallel Stream Works
9
• The split-apply-combine phases are:
1. Split – Recursively partition a data source into independent“chunks”
• Spliterators are definedto partition collections in Java 8
• You can also define custom spliterators
InputString1.1 InputString1.2 InputString2.1 InputString2.2
InputString1 InputString2
InputString
See github.com/douglascraigschmidt/LiveLessons/tree/master/SearchStreamSpliterator
45,000+ phrases
Search Phrases
Input Strings to Search
…
trySplit()
trySplit() trySplit()
Overview of How a Parallel Stream Works
10
• The split-apply-combine phases are:
1. Split – Recursively partition a data source into independent“chunks”
• Spliterators are definedto partition collections in Java 8
• You can also define custom spliterators
• Parallel streams perform better on data sources that can be split efficiently & evenly
InputString1.1 InputString1.2 InputString2.1 InputString2.2
InputString1 InputString2
InputString
See www.airpair.com/java/posts/parallel-processing-of-io-based-data-with-java-streams
trySplit()
trySplit() trySplit()
Overview of How a Parallel Stream Works
11
Processsequentially
Processsequentially
Processsequentially
Processsequentially
• The split-apply-combine phases are:
1. Split – Recursively partition a data source into independent“chunks”
2. Apply – Process chunksindependently in one(common) thread pool
InputString1.1 InputString1.2 InputString2.1 InputString2.2
InputString1 InputString2
InputString
Splitting & applying run simultaneously (after certain limit met), not sequentially
Overview of How a Parallel Stream Works
12
Processsequentially
Processsequentially
Processsequentially
Processsequentially
• The split-apply-combine phases are:
1. Split – Recursively partition a data source into independent“chunks”
2. Apply – Process chunksindependently in one(common) thread pool
• Programmers have somecontrol over how manythreads are in the pool
InputString1.1 InputString1.2 InputString2.1 InputString2.2
InputString1 InputString2
InputString
Overview of How a Parallel Stream Works
13
• The split-apply-combine phases are:
1. Split – Recursively partition a data source into independent“chunks”
2. Apply – Process chunksindependently in one(common) thread pool
3. Combine – Join partial results into a single result
join join
join
Processsequentially
Processsequentially
Processsequentially
Processsequentially
InputString1.1 InputString1.2 InputString2.1 InputString2.2
InputString1 InputString2
InputString
Overview of How a Parallel Stream Works
14
• The split-apply-combine phases are:
1. Split – Recursively partition a data source into independent“chunks”
2. Apply – Process chunksindependently in one(common) thread pool
3. Combine – Join partial results into a single result
• Performed by terminal operations like collect() & reduce()
join join
join
Processsequentially
Processsequentially
Processsequentially
Processsequentially
See www.codejava.net/java-core/collections/java-8-stream-terminal-operations-examples
InputString1.1 InputString1.2 InputString2.1 InputString2.2
InputString1 InputString2
InputString
Overview of How a Parallel Stream Works
15
Avoiding Concurrency Hazards in Java 8 Parallel Streams
16
• The Java 8 parallel streams framework assumes behaviors don’t incur race conditions
See en.wikipedia.org/wiki/Race_condition#Software
Avoiding Concurrency Hazards in Java 8 Parallel Streams
Race conditions arise when an app depends on the sequence or timing of threads for it to operate properly
Aggregate operation (Function f)
Input x
Output f(x)
Output g(f(x))
Output h(g(f(x)))
Aggregate operation (Function g)
Aggregate operation (Function h)
17See docs.oracle.com/javase/tutorial/collections/streams/parallelism.html#side_effects
Avoiding Concurrency Hazards in Java 8 Parallel Streams
map(phrase -> searchForPhrase(…))
filter(not(SearchResults::isEmpty))
collect(toList())
Input Strings to Search
…
45,000+ phrases
Search Phrases
parallelStream()
• Parallel streams should therefore avoid behaviors with side-effects
18
• Parallel streams should therefore avoid behaviors with side-effects, e.g.
• Stateful lambda expressions
• Where results depend on sharedmutable state
Avoiding Concurrency Hazards in Java 8 Parallel Streamsclass BuggyFactorial {
static class Total {
long mTotal = 1;
void multiply(long n)
{ mTotal *= n; }
}
static long factorial(long n){
Total t = new Total();
LongStream
.rangeClosed(1, n)
.parallel()
.forEach(t::multiply);
return t.mTotal;
} ...
See docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html#Statelessness
19
• Parallel streams should therefore avoid behaviors with side-effects, e.g.
• Stateful lambda expressions
• Where results depend on sharedmutable state
• i.e., state that may change inparallel execution of a pipeline
Avoiding Concurrency Hazards in Java 8 Parallel Streamsclass BuggyFactorial {
static class Total {
long mTotal = 1;
void multiply(long n)
{ mTotal *= n; }
}
static long factorial(long n){
Total t = new Total();
LongStream
.rangeClosed(1, n)
.parallel()
.forEach(t::multiply);
return t.mTotal;
} ...
20
• Parallel streams should therefore avoid behaviors with side-effects, e.g.
• Stateful lambda expressions
• Where results depend on sharedmutable state
• i.e., state that may change inparallel execution of a pipeline
Avoiding Concurrency Hazards in Java 8 Parallel Streams
Race conditions can arise due to the unsynchronized access to mTotal field
See github.com/douglascraigschmidt/LiveLessons/tree/master/Java8/ex16
class BuggyFactorial {
static class Total {
long mTotal = 1;
void multiply(long n)
{ mTotal *= n; }
}
static long factorial(long n){
Total t = new Total();
LongStream
.rangeClosed(1, n)
.parallel()
.forEach(t::multiply);
return t.mTotal;
} ...
21
• Parallel streams should therefore avoid behaviors with side-effects, e.g.
• Stateful lambda expressions
• Interference w/the data source
• Occurs when source of stream is modified within the pipeline
See docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html#NonInterference
Avoiding Concurrency Hazards in Java 8 Parallel StreamsList<Integer> list = IntStream
.range(0, 10)
.boxed()
.collect(toList());
list
.parallelStream()
.peek(list::remove)
.forEach(System.out::println);
22
• Parallel streams should therefore avoid behaviors with side-effects, e.g.
• Stateful lambda expressions
• Interference w/the data source
• Occurs when source of stream is modified within the pipeline
Avoiding Concurrency Hazards in Java 8 Parallel StreamsList<Integer> list = IntStream
.range(0, 10)
.boxed()
.collect(toList());
list
.parallelStream()
.peek(list::remove)
.forEach(System.out::println);
Aggregate operations enable parallelism with non-thread-safe collections provided the collection is not modified while it’s being operated on..
See github.com/douglascraigschmidt/LiveLessons/tree/master/Java8/ex11
23
• Java 8 lambda expressions & method references containing no shared state are useful for parallel streams since they needn’t be explicitly synchronized
See henrikeichenhardt.blogspot.com/2013/06/why-shared-mutable-state-is-root-of-all.html
return mList.size() == 0;
return new SearchResults
(Thread.currentThread().getId(),
currentCycle(), phrase, title,
StreamSupport
.stream(new PhraseMatchSpliterator
(input, phrase),
parallel)
.collect(toList()));
Avoiding Concurrency Hazards in Java 8 Parallel Streams
map(phrase -> searchForPhrase(…))
filter(not(SearchResults::isEmpty))
collect(toList())
45,000+ phrases
Search Phrases
parallelStream()
24
End of Overview of Java 8 Parallel Streams
(Part 2)