32
Lecture 8 Collective Pattern Collectives Pattern CS 472 Concurrent & Parallel Programming University of Evansville Selection of slides from CIS 410/510 Introduction to Parallel Computing Department of Computer and Information Science, University of Oregon http://ipcc.cs.uoregon.edu/curriculum.html

CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Collectives PatternCS 472

Concurrent & Parallel ProgrammingUniversity of Evansville

Selection of slides from CIS 410/510 Introduction to Parallel Computing

Department of Computer and Information Science, University of Oregon

http://ipcc.cs.uoregon.edu/curriculum.html

Page 2: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Announcements

• No class next Thursday, September 21. Instructor will be traveling to a conference. It will be the second day of working on the lab project associated with today’s topic.

CS 472 Concurrent & Parallel Programming Lecture 5 Collectives

2

Page 3: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Collectives

• Collective operations deal with a collection of data as a whole, rather than as separate elements

• Collective patterns include: • Reduce

• Scan

• Partition

• Scatter

• Gather

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives3

Page 4: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Collectives

• Collective operations deal with a collection of data as a whole, rather than as separate elements

• Collective patterns include: • Reduce

• Scan

• Partition

• Scatter

• Gather

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives4

Reduce and Scan will be covered in this lecture

Page 5: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Reduce

• Reduce is used to combine a collection of elements into one summary value

• A combiner function combines elements pairwise

• A combiner function only needs to be associative to be parallelizable

• Example combiner functions:• Addition

• Multiplication

• Maximum / Minimum

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives5

Page 6: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Reduce

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives6

Serial Reduction Parallel Reduction

Page 7: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Reduce

• Vectorization

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives7

Page 8: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Reduce

• Tiling is used to break chunks of work up for workers to reduce serially

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives8

Page 9: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Reduce – Add Example

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives9

1 2 45 9 7 0 1

Page 10: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Reduce – Add Example

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives10

1 2 45 9 7 0 1

28

12

3

8

21

29

28

29

Page 11: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Reduce – Add Example

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives11

1 2 45 9 7 0 1

Page 12: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Reduce – Add Example

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives12

1 2 45 9 7 0 1

3 9 116

12 17

29

29

Page 13: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Reduce

• We can “fuse” the map and reduce patterns

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives13

Page 14: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Reduce

• Precision can become a problem with reductions on floating point data

• Different orderings of floating point data can change the reduction value

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives14

Page 15: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Reduce Example: Dot Product

• 2 vectors of same length

• Map (*) to multiply the components

• Then reduce with (+) to get the final answer

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives15

Also:

Page 16: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Dot Product – Example Uses• Essential operation in physics, graphics, video games,…

• Gaming analogy: in Mario Kart, there are “boost pads” on the ground that increase your speed• red vector is your speed (x and y direction)

• blue vector is the orientation of the boost pad (x and y direction). Larger numbers are more power.

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives16

Photo source

How much boost will you get? For the analogy, imagine the pad multiplies your speed:• If you come in going 0, you’ll get nothing• If you cross the pad perpendicularly, you’ll

get 0 [just like the banana obliteration, it will give you 0x boost in the perpendicular direction]

Ref: http://betterexplained.com/articles/vector-calculus-understanding-the-dot-product/

Page 17: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Dot Product Code Examples

• Dot product code examples available on csserver in directory /home/hwang/cs472/dotproduct

CS 472 Concurrent & Parallel Programming Lecture 5 Collectives

17

Page 18: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Scan

• The scan pattern produces partial reductions of input sequence, generates new sequence

• Trickier to parallelize than reduce

• Inclusive scan vs. exclusive scan• Inclusive scan: includes current element in partial

reduction

• Exclusive scan: excludes current element in partial reduction, partial reduction is of all prior elements prior to current element

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives18

Page 19: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Scan – Example Uses

• Lexical comparison of strings – e.g., determine that “strategy” should appear before “stratification” in a dictionary

• Add multi-precision numbers (those that cannot be represented in a single machine word)

• Evaluate polynomials

• Implement radix sort or quicksort

• Delete marked elements in an array

• Dynamically allocate processors

• Lexical analysis – parsing programs into tokens

• Searching for regular expressions

• Labeling components in 2-D images

• Some tree algorithms – e.g., finding the depth of every vertex in a tree

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives19

Page 20: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Scan

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives20

Serial Scan

Parallel Scan

Page 21: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Scan

• One algorithm for parallelizing scan is to perform an “up sweep” and a “down sweep”

• Reduce the input on the up sweep

• The down sweep produces the intermediate results

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives21

Up sweep – compute reduction

Down sweep – compute intermediate values

Page 22: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Scan – Maximum Example

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives22

1 2 7 2 4 34 0

1 2 7 2 4 34 0

Page 23: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Scan – Maximum Example

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives23

1 2 7 2 4 34 0

1 2 7 2 4 34 0

1 1

4

4

4

44

4

4

4

44

42

4

7

7

7

7

7

7

7

7

7

7

7

7

7

7

7

4

777

Page 24: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Scan

• Three phase scan with tiling

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives24

Page 25: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Scan

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives25

Page 26: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Scan

• Just like reduce, we can also fuse the map pattern with the scan pattern

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives26

Page 27: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Scan

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives27

Page 28: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Merge Sort as a reduction

• We can sort an array via a pair of a map and a reduce

• Map each element into a vector containing just that element

• <> is the merge operation: [1,3,5,7] <> [2,6,15] = [1,2,3,5,6,7,15]

• [] is the empty list

• How fast is this?

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives28

Page 29: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Right Biased Sort

Start with [14,3,4,8,7,52,1]

Map to [[14],[3],[4],[8],[7],[52],[1]]

Reduce:

[14] <> ([3] <> ([4] <> ([8] <> ([7] <> ([52] <> [1])))))= [14] <> ([3] <> ([4] <> ([8] <> ([7] <> [1,52]))))

= [14] <> ([3] <> ([4] <> ([8] <> [1,7,52])))

= [14] <> ([3] <> ([4] <> [1,7,8,52]))

= [14] <> ([3] <> [1,4,7,8,52])

= [14] <> [1,3,4,7,8,52]

= [1,3,4,7,8,14,52]

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives29

Page 30: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Right Biased Sort Continued

• How long did that take?

• We did O(n) merges…but each one took O(n) time

• O(n2)

• We wanted merge sort, but instead we got insertion sort!

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives30

Page 31: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Tree Shape Sort

Start with [14,3,4,8,7,52,1]

Map to [[14],[3],[4],[8],[7],[52],[1]]

Reduce:(([14] <> [3]) <> ([4] <> [8])) <> (([7] <> [52]) <> [1])

= ([3,14] <> [4,8]) <> ([7,52] <> [1])

= [3,4,8,14] <> [1,7,52]

= [1,3,4,7,8,14,52]

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives31

Page 32: CS 472 Concurrent & Parallel Programming University of ...uenics.evansville.edu/~hwang/f17-courses/cs472/lecture05-collectiv… · CS 472 Concurrent & Parallel Programming Lecture

Lecture 8 – Collective Pattern

Tree Shaped Sort Performance

• Even if we only had a single processor this is better• We do O(log n) merges

• Each one is O(n)

• So O(n*log(n))

• But opportunity for parallelism is not so great• O(n) assuming sequential merge

• Takeaway: the shape of reduction matters!

CS 472 Concurrent & Parallel Programming

Lecture 5 Collectives32