Fork Join Framework 428206

Embed Size (px)

Citation preview

  • 8/12/2019 Fork Join Framework 428206

    1/54

  • 8/12/2019 Fork Join Framework 428206

    2/54

    Divide and Conquer Parallelismwith the Fork/Join Framework

    Mark Reinhold (@mreinhold)Chief Architect, Java Platform Group

    2011/7/7

  • 8/12/2019 Fork Join Framework 428206

    3/54

    1970 1975 1980 1985 1990 1995 2000 2005

    1

    10

    100

    1,000

    10,000

    1,000,000

    100,000

    10,000000

  • 8/12/2019 Fork Join Framework 428206

    4/54

    1970 1975 1980 1985 1990 1995 2000 2005

    1

    10

    100

    1,000

    10,000

    1,000,000

    100,000

    10,000000

    Clock

  • 8/12/2019 Fork Join Framework 428206

    5/54

    1970 1975 1980 1985 1990 1995 2000 2005

    1

    10

    100

    1,000

    10,000

    1,000,000

    100,000

    10,000000

    Clock

    Transistors (1,000s)

  • 8/12/2019 Fork Join Framework 428206

    6/54

  • 8/12/2019 Fork Join Framework 428206

    7/54

    Niagara 1 (2

    8 x 4 = 32

  • 8/12/2019 Fork Join Framework 428206

    8/54

    Niagara 2 (2

    8 x 8 = 64

  • 8/12/2019 Fork Join Framework 428206

    9/54

    Rainbow Fa16 x 8 = 128

  • 8/12/2019 Fork Join Framework 428206

    10/54

    Trends

  • 8/12/2019 Fork Join Framework 428206

    11/54

    Trends

    One core Threads were for asynchrony, not paralle

  • 8/12/2019 Fork Join Framework 428206

    12/54

    Trends

    One core Threads were for asynchrony, not paralle

    Some cores Coarse-grained parallelism usually enoug Application-level requests were good task bo Thread pools were a reasonable scheduling

  • 8/12/2019 Fork Join Framework 428206

    13/54

    Trends

    One core Threads were for asynchrony, not paralle

    Some cores Coarse-grained parallelism usually enoug Application-level requests were good task bo Thread pools were a reasonable scheduling

    Many cores Coarse-grained parallelism insufficient Application-level requests wont keep cores Shared work queues become a bottleneck

  • 8/12/2019 Fork Join Framework 428206

    14/54

    Trends

    One core Threads were for asynchrony, not paralle

    Some cores Coarse-grained parallelism usually enoug Application-level requests were good task bo Thread pools were a reasonable scheduling

    Many cores Coarse-grained parallelism insufficient Application-level requests wont keep cores Shared work queues become a bottleneck

    Need to find finer-grained, CPU-intensive paral

  • 8/12/2019 Fork Join Framework 428206

    15/54

    The key challenges for multicore c

  • 8/12/2019 Fork Join Framework 428206

    16/54

    The key challenges for multicore c

    (1) Decompose problems into parallelizable work un

  • 8/12/2019 Fork Join Framework 428206

    17/54

    The key challenges for multicore c

    (1) Decompose problems into parallelizable work un

    (2) Continue to meet (1) as the number of cores inc

  • 8/12/2019 Fork Join Framework 428206

    18/54

    No silver bullet

  • 8/12/2019 Fork Join Framework 428206

    19/54

    No silver bullet

    Many point solutions:

  • 8/12/2019 Fork Join Framework 428206

    20/54

    No silver bullet

    Many point solutions:

    Work queues + thread pools

  • 8/12/2019 Fork Join Framework 428206

    21/54

    No silver bullet

    Many point solutions:

    Work queues + thread pools Divide & conquer (fork/join)

  • 8/12/2019 Fork Join Framework 428206

    22/54

    No silver bullet

    Many point solutions:

    Work queues + thread pools Divide & conquer (fork/join) Bulk data operations (select/map/reduce)

  • 8/12/2019 Fork Join Framework 428206

    23/54

    No silver bullet

    Many point solutions:

    Work queues + thread pools Divide & conquer (fork/join) Bulk data operations (select/map/reduce) Actors

  • 8/12/2019 Fork Join Framework 428206

    24/54

    No silver bullet

    Many point solutions:

    Work queues + thread pools Divide & conquer (fork/join) Bulk data operations (select/map/reduce) Actors

    Software transactional memory (STM)

  • 8/12/2019 Fork Join Framework 428206

    25/54

    No silver bullet

    Many point solutions:

    Work queues + thread pools Divide & conquer (fork/join) Bulk data operations (select/map/reduce) Actors

    Software transactional memory (STM) GPU-based SIMD-style computation

  • 8/12/2019 Fork Join Framework 428206

    26/54

    Divide & conquer

  • 8/12/2019 Fork Join Framework 428206

    27/54

    Divide & conquer

  • 8/12/2019 Fork Join Framework 428206

    28/54

    Divide & conquer

  • 8/12/2019 Fork Join Framework 428206

    29/54

    Divide & conquer

  • 8/12/2019 Fork Join Framework 428206

    30/54

    Divide & conquer

  • 8/12/2019 Fork Join Framework 428206

    31/54

    Divide & conquer

  • 8/12/2019 Fork Join Framework 428206

    32/54

    Result solve(Problem p) { if (p.size() < SEQUENTIAL_THRESHOLD) {

    return p.solveSequentially(); } else { int m = n / 2; Result left, right; INVOKE-IN-PARALLEL{ left = solve(p.leftHalf()); right = solve(p.rightHalf());

    } return combine(left, right); }}

  • 8/12/2019 Fork Join Framework 428206

    33/54

  • 8/12/2019 Fork Join Framework 428206

    34/54

    class Student { String name; int gradYear; double score;}

  • 8/12/2019 Fork Join Framework 428206

    35/54

    class Student { String name; int gradYear; double score;}

    List students = ...;

  • 8/12/2019 Fork Join Framework 428206

    36/54

    double max = Double.MIN_VALUE;for (Student s : students) { if (s.gradYear == 2010)

    max = Math.max(max, s.score);}

    class Student { String name; int gradYear; double score;}

    List students = ...;

  • 8/12/2019 Fork Join Framework 428206

    37/54

    double max = Double.MIN_VALUE;for (Student s : students) { if (s.gradYear == 2010) max = Math.max(max, s.score);

    }return max; }

    class MaxFinder {

    final List students;

    MaxFinder(List ls) { students = ls; }

    double find() {

  • 8/12/2019 Fork Join Framework 428206

    38/54

    double max = Double.MIN_VALUE;for (Student s : students) { if (s.gradYear == 2010) max = Math.max(max, s.score);

    }return max; }

    MaxFinder subFinder(int s, int e) { return new MaxFinder(students.subList(s, e) }

    class MaxFinder {

    final List students;

    MaxFinder(List ls) { students = ls; }

    double find() {

  • 8/12/2019 Fork Join Framework 428206

    39/54

  • 8/12/2019 Fork Join Framework 428206

    40/54

    // Fork/join frameworkimport java.util.concurrent.*;

  • 8/12/2019 Fork Join Framework 428206

    41/54

    // Fork/join frameworkimport java.util.concurrent.*;

    class MaxFinderTask extends RecursiveAction

    {

    final MaxFinder maxf; double result;

    MaxFinderTask(MaxFinder mf) { maxf = mf; }

  • 8/12/2019 Fork Join Framework 428206

    42/54

    class MaxFinderTask extends RecursiveAction{

    protected void compute() {

    int n = maxf.students.size(); if (n < SEQUENTIAL_THRESHOLD) { result = maxf.find(); } else { int m = n / 2; MaxFinderTask left = new MaxFinderTask(maxf.subFinder( MaxFinderTask right = new MaxFinderTask(maxf.subFinder( invokeAll(left, right); result = Math.max(left.result, right.re } }

  • 8/12/2019 Fork Join Framework 428206

    43/54

    class MaxFinder {

    double find() { double max = Double.MIN_VALUE; for (Student s : students) {

    if (s.gradYear == 2010) max = Math.max(max, s.score); } return max; }

    MaxFinder subFinder(int s, int e) { return new MaxFinder(students.subList(s, e) }

  • 8/12/2019 Fork Join Framework 428206

    44/54

    class MaxFinder {

    double find() {

    double parallelFind() { MaxFinderTask mft = new MaxFinderTask(this) ForkJoinPoolpool = new ForkJoinPool(); pool.invoke(mft); return mft.result; }

    MaxFinder subFinder(int s, int e) {

    return new MaxFinder(students.subList(s, e) }

    ... }

    l i d k

  • 8/12/2019 Fork Join Framework 428206

    45/54

    protected void compute() {

    int n = maxf.students.size(); if (n < SEQUENTIAL_THRESHOLD) { result = maxf.find(); } else { int m = n / 2; MaxFinderTask left = new MaxFinderTask(maxf.subFinder( MaxFinderTask right = new MaxFinderTask(maxf.subFinder( invokeAll(left, right); result = Math.max(left.result, right.re } }

    class MaxFinderTask extends RecursiveAction{

    l M Fi d T k

  • 8/12/2019 Fork Join Framework 428206

    46/54

    protected void compute() {

    int n = maxf.students.size(); if (n < SEQUENTIAL_THRESHOLD) { result = maxf.find(); } else { int m = n / 2; MaxFinderTask left = new MaxFinderTask(maxf.subFinder( MaxFinderTask right = new MaxFinderTask(maxf.subFinder( invokeAll(left, right); result = Math.max(left.result, right.re } }

    // ???

    class MaxFinderTask extends RecursiveAction{

    Performance considerations

  • 8/12/2019 Fork Join Framework 428206

    47/54

    Performance considerations

    Performance considerations

  • 8/12/2019 Fork Join Framework 428206

    48/54

    Performance considerations

    Choosing the sequential threshold Smaller tasks increase parallelism

    Larger tasks reduce coordination overhead

    Ultimately you must profile your code

    Performance considerations

  • 8/12/2019 Fork Join Framework 428206

    49/54

    Performance considerations

    Choosing the sequential threshold Smaller tasks increase parallelism

    Larger tasks reduce coordination overhead

    Ultimately you must profile your code

    Sequential threshold 500K 50K 5K 500

    Dual Xeon HT (4) 0.88 3.02 3.20 2.22 08-way Opteron (8) 1.00 5.29 5.73 4.53 2

    8-core Niagara (32) 0.98 10.46 17.21 15.34 6

    Performance considerations

  • 8/12/2019 Fork Join Framework 428206

    50/54

    Performance considerations

    Performance considerations

  • 8/12/2019 Fork Join Framework 428206

    51/54

    Performance considerations

    The fork/join framework minimizes per-task overheadforcompute-intensivetasks Not recommended for tasks that mix CPU and I/O activity

    Performance considerations

  • 8/12/2019 Fork Join Framework 428206

    52/54

    Performance considerations

    The fork/join framework minimizes per-task overheadforcompute-intensivetasks Not recommended for tasks that mix CPU and I/O activity

    A portable way to express many parallel algorithms Code is independent of execution topology

    Reasonably efficient for a wide range of core counts

    Library-managed parallelism

    No silver bulletbut many useful

  • 8/12/2019 Fork Join Framework 428206

    53/54

    No silver bullet but many useful

    Many point solutions:

    Work queues + thread pools Divide & conquer (fork/join) Bulk data operations (select/map/reduce) Actors

    Software transactional memory (STM) GPU-based SIMD-style computation

  • 8/12/2019 Fork Join Framework 428206

    54/54