Upload
ferdaous-hd
View
234
Download
0
Embed Size (px)
Citation preview
8/12/2019 Fork Join Framework 428206
1/54
8/12/2019 Fork Join Framework 428206
2/54
Divide and Conquer Parallelismwith the Fork/Join Framework
Mark Reinhold (@mreinhold)Chief Architect, Java Platform Group
2011/7/7
8/12/2019 Fork Join Framework 428206
3/54
1970 1975 1980 1985 1990 1995 2000 2005
1
10
100
1,000
10,000
1,000,000
100,000
10,000000
8/12/2019 Fork Join Framework 428206
4/54
1970 1975 1980 1985 1990 1995 2000 2005
1
10
100
1,000
10,000
1,000,000
100,000
10,000000
Clock
8/12/2019 Fork Join Framework 428206
5/54
1970 1975 1980 1985 1990 1995 2000 2005
1
10
100
1,000
10,000
1,000,000
100,000
10,000000
Clock
Transistors (1,000s)
8/12/2019 Fork Join Framework 428206
6/54
8/12/2019 Fork Join Framework 428206
7/54
Niagara 1 (2
8 x 4 = 32
8/12/2019 Fork Join Framework 428206
8/54
Niagara 2 (2
8 x 8 = 64
8/12/2019 Fork Join Framework 428206
9/54
Rainbow Fa16 x 8 = 128
8/12/2019 Fork Join Framework 428206
10/54
Trends
8/12/2019 Fork Join Framework 428206
11/54
Trends
One core Threads were for asynchrony, not paralle
8/12/2019 Fork Join Framework 428206
12/54
Trends
One core Threads were for asynchrony, not paralle
Some cores Coarse-grained parallelism usually enoug Application-level requests were good task bo Thread pools were a reasonable scheduling
8/12/2019 Fork Join Framework 428206
13/54
Trends
One core Threads were for asynchrony, not paralle
Some cores Coarse-grained parallelism usually enoug Application-level requests were good task bo Thread pools were a reasonable scheduling
Many cores Coarse-grained parallelism insufficient Application-level requests wont keep cores Shared work queues become a bottleneck
8/12/2019 Fork Join Framework 428206
14/54
Trends
One core Threads were for asynchrony, not paralle
Some cores Coarse-grained parallelism usually enoug Application-level requests were good task bo Thread pools were a reasonable scheduling
Many cores Coarse-grained parallelism insufficient Application-level requests wont keep cores Shared work queues become a bottleneck
Need to find finer-grained, CPU-intensive paral
8/12/2019 Fork Join Framework 428206
15/54
The key challenges for multicore c
8/12/2019 Fork Join Framework 428206
16/54
The key challenges for multicore c
(1) Decompose problems into parallelizable work un
8/12/2019 Fork Join Framework 428206
17/54
The key challenges for multicore c
(1) Decompose problems into parallelizable work un
(2) Continue to meet (1) as the number of cores inc
8/12/2019 Fork Join Framework 428206
18/54
No silver bullet
8/12/2019 Fork Join Framework 428206
19/54
No silver bullet
Many point solutions:
8/12/2019 Fork Join Framework 428206
20/54
No silver bullet
Many point solutions:
Work queues + thread pools
8/12/2019 Fork Join Framework 428206
21/54
No silver bullet
Many point solutions:
Work queues + thread pools Divide & conquer (fork/join)
8/12/2019 Fork Join Framework 428206
22/54
No silver bullet
Many point solutions:
Work queues + thread pools Divide & conquer (fork/join) Bulk data operations (select/map/reduce)
8/12/2019 Fork Join Framework 428206
23/54
No silver bullet
Many point solutions:
Work queues + thread pools Divide & conquer (fork/join) Bulk data operations (select/map/reduce) Actors
8/12/2019 Fork Join Framework 428206
24/54
No silver bullet
Many point solutions:
Work queues + thread pools Divide & conquer (fork/join) Bulk data operations (select/map/reduce) Actors
Software transactional memory (STM)
8/12/2019 Fork Join Framework 428206
25/54
No silver bullet
Many point solutions:
Work queues + thread pools Divide & conquer (fork/join) Bulk data operations (select/map/reduce) Actors
Software transactional memory (STM) GPU-based SIMD-style computation
8/12/2019 Fork Join Framework 428206
26/54
Divide & conquer
8/12/2019 Fork Join Framework 428206
27/54
Divide & conquer
8/12/2019 Fork Join Framework 428206
28/54
Divide & conquer
8/12/2019 Fork Join Framework 428206
29/54
Divide & conquer
8/12/2019 Fork Join Framework 428206
30/54
Divide & conquer
8/12/2019 Fork Join Framework 428206
31/54
Divide & conquer
8/12/2019 Fork Join Framework 428206
32/54
Result solve(Problem p) { if (p.size() < SEQUENTIAL_THRESHOLD) {
return p.solveSequentially(); } else { int m = n / 2; Result left, right; INVOKE-IN-PARALLEL{ left = solve(p.leftHalf()); right = solve(p.rightHalf());
} return combine(left, right); }}
8/12/2019 Fork Join Framework 428206
33/54
8/12/2019 Fork Join Framework 428206
34/54
class Student { String name; int gradYear; double score;}
8/12/2019 Fork Join Framework 428206
35/54
class Student { String name; int gradYear; double score;}
List students = ...;
8/12/2019 Fork Join Framework 428206
36/54
double max = Double.MIN_VALUE;for (Student s : students) { if (s.gradYear == 2010)
max = Math.max(max, s.score);}
class Student { String name; int gradYear; double score;}
List students = ...;
8/12/2019 Fork Join Framework 428206
37/54
double max = Double.MIN_VALUE;for (Student s : students) { if (s.gradYear == 2010) max = Math.max(max, s.score);
}return max; }
class MaxFinder {
final List students;
MaxFinder(List ls) { students = ls; }
double find() {
8/12/2019 Fork Join Framework 428206
38/54
double max = Double.MIN_VALUE;for (Student s : students) { if (s.gradYear == 2010) max = Math.max(max, s.score);
}return max; }
MaxFinder subFinder(int s, int e) { return new MaxFinder(students.subList(s, e) }
class MaxFinder {
final List students;
MaxFinder(List ls) { students = ls; }
double find() {
8/12/2019 Fork Join Framework 428206
39/54
8/12/2019 Fork Join Framework 428206
40/54
// Fork/join frameworkimport java.util.concurrent.*;
8/12/2019 Fork Join Framework 428206
41/54
// Fork/join frameworkimport java.util.concurrent.*;
class MaxFinderTask extends RecursiveAction
{
final MaxFinder maxf; double result;
MaxFinderTask(MaxFinder mf) { maxf = mf; }
8/12/2019 Fork Join Framework 428206
42/54
class MaxFinderTask extends RecursiveAction{
protected void compute() {
int n = maxf.students.size(); if (n < SEQUENTIAL_THRESHOLD) { result = maxf.find(); } else { int m = n / 2; MaxFinderTask left = new MaxFinderTask(maxf.subFinder( MaxFinderTask right = new MaxFinderTask(maxf.subFinder( invokeAll(left, right); result = Math.max(left.result, right.re } }
8/12/2019 Fork Join Framework 428206
43/54
class MaxFinder {
double find() { double max = Double.MIN_VALUE; for (Student s : students) {
if (s.gradYear == 2010) max = Math.max(max, s.score); } return max; }
MaxFinder subFinder(int s, int e) { return new MaxFinder(students.subList(s, e) }
8/12/2019 Fork Join Framework 428206
44/54
class MaxFinder {
double find() {
double parallelFind() { MaxFinderTask mft = new MaxFinderTask(this) ForkJoinPoolpool = new ForkJoinPool(); pool.invoke(mft); return mft.result; }
MaxFinder subFinder(int s, int e) {
return new MaxFinder(students.subList(s, e) }
... }
l i d k
8/12/2019 Fork Join Framework 428206
45/54
protected void compute() {
int n = maxf.students.size(); if (n < SEQUENTIAL_THRESHOLD) { result = maxf.find(); } else { int m = n / 2; MaxFinderTask left = new MaxFinderTask(maxf.subFinder( MaxFinderTask right = new MaxFinderTask(maxf.subFinder( invokeAll(left, right); result = Math.max(left.result, right.re } }
class MaxFinderTask extends RecursiveAction{
l M Fi d T k
8/12/2019 Fork Join Framework 428206
46/54
protected void compute() {
int n = maxf.students.size(); if (n < SEQUENTIAL_THRESHOLD) { result = maxf.find(); } else { int m = n / 2; MaxFinderTask left = new MaxFinderTask(maxf.subFinder( MaxFinderTask right = new MaxFinderTask(maxf.subFinder( invokeAll(left, right); result = Math.max(left.result, right.re } }
// ???
class MaxFinderTask extends RecursiveAction{
Performance considerations
8/12/2019 Fork Join Framework 428206
47/54
Performance considerations
Performance considerations
8/12/2019 Fork Join Framework 428206
48/54
Performance considerations
Choosing the sequential threshold Smaller tasks increase parallelism
Larger tasks reduce coordination overhead
Ultimately you must profile your code
Performance considerations
8/12/2019 Fork Join Framework 428206
49/54
Performance considerations
Choosing the sequential threshold Smaller tasks increase parallelism
Larger tasks reduce coordination overhead
Ultimately you must profile your code
Sequential threshold 500K 50K 5K 500
Dual Xeon HT (4) 0.88 3.02 3.20 2.22 08-way Opteron (8) 1.00 5.29 5.73 4.53 2
8-core Niagara (32) 0.98 10.46 17.21 15.34 6
Performance considerations
8/12/2019 Fork Join Framework 428206
50/54
Performance considerations
Performance considerations
8/12/2019 Fork Join Framework 428206
51/54
Performance considerations
The fork/join framework minimizes per-task overheadforcompute-intensivetasks Not recommended for tasks that mix CPU and I/O activity
Performance considerations
8/12/2019 Fork Join Framework 428206
52/54
Performance considerations
The fork/join framework minimizes per-task overheadforcompute-intensivetasks Not recommended for tasks that mix CPU and I/O activity
A portable way to express many parallel algorithms Code is independent of execution topology
Reasonably efficient for a wide range of core counts
Library-managed parallelism
No silver bulletbut many useful
8/12/2019 Fork Join Framework 428206
53/54
No silver bullet but many useful
Many point solutions:
Work queues + thread pools Divide & conquer (fork/join) Bulk data operations (select/map/reduce) Actors
Software transactional memory (STM) GPU-based SIMD-style computation
8/12/2019 Fork Join Framework 428206
54/54