39
Algorithms Parallel Algorithms 1

Parallel Algorithms

Embed Size (px)

Citation preview

Page 1: Parallel Algorithms

Algorithms

Parallel Algorithms

1

Page 2: Parallel Algorithms

Page 2

An overview

• A simple parallel algorithm for computing

parallel prefix.

• A parallel merging algorithm

Page 3: Parallel Algorithms

Page 3

• We are given an ordered set A of nelements

and a binary associative operator .

• We have to compute the ordered set

0 1 2 1, , ,..., nA a a a a

0 0 1 0 1 1, ,..., ... na a a a a a

Definition of prefix computation

Page 4: Parallel Algorithms

Page 4

• For example, if is + and the input is the

ordered set

{5, 3, -6, 2, 7, 10, -2, 8}

then the output is

{5, 8, 2, 4, 11, 21, 19, 27}

• Prefix sum can be computed in O (n) time

sequentially.

An example of prefix computation

Page 5: Parallel Algorithms

Page 5

First Pass

• For every internal node of the tree, compute

the sum of all the leaves in its subtree in a

bottom-up fashion.

sum[v] := sum[L[v]] + sum[R[v]]

Using a binary tree

Page 6: Parallel Algorithms

Page 6

for d = 0 to log n – 1 do

for i = 0 to n – 1 by 2d+1 do in parallel

a[i + 2d+1 - 1] := a[i + 2d - 1] + a[i + 2d+1 - 1]

• In our example, n = 8, hence the outer loop

iterates 3 times, d = 0, 1, 2.

Parallel prefix computation

Page 7: Parallel Algorithms

Page 7

• d = 0: In this case, the increments of 2d+1 will be

in terms of 2 elements.

• for i = 0,

a[0 + 20+1 - 1] := a[0 + 20 - 1] + a[0 + 20+1 - 1]

or, a[1] := a[0] + a[1]

When d= 0

Page 8: Parallel Algorithms

Page 8

• d = 1: In this case, the increments of 2d+1 will be

in terms of 4 elements.

• for i = 0,

a[0 + 21+1 - 1] := a[0 + 21 - 1] + a[0 + 21+1 - 1]

or, a[3] := a[1] + a[3]

• for i = 4,

a[4 + 21+1 - 1] := a[4 + 21 - 1] + a[4 + 21+1 - 1]

or, a[7] := a[5] + a[7]

When d = 1

Page 9: Parallel Algorithms

Page 9

• blue: no change from last iteration.

• magenta: changed in the current iteration.

The First Pass

Page 10: Parallel Algorithms

Page 10

Second Pass

• The idea in the second pass is to do a

topdown computation to generate all the prefix

sums.

• We use the notation pre[v] to denote the prefix

sum at every node.

The Second Pass

Page 11: Parallel Algorithms

Page 11

• pre[root] := 0, the identity element for the

operation, since we are considering the

operation.

• If the operation is max, the identity element will

be .

Computation in the second phase

Page 12: Parallel Algorithms

Page 12

pre[L[v]] := pre[v]

pre[R[v]] := sum[L[v]] + pre[v]

Second phase (continued)

Page 13: Parallel Algorithms

Page 13

Example of second phase

pre[L[v]] := pre[v]

pre[R[v]] := sum[L[v]] + pre[v]

Page 14: Parallel Algorithms

Page 14

for d = (log n – 1) downto 0 do

for i = 0 to n – 1 by 2d+1 do in parallel

temp := a[i + 2d - 1]

a[i + 2d - 1] := a[i + 2d+1 - 1] (left child)

a[i + 2d+1 - 1] := temp + a[i + 2d+1 - 1] (right

child)

a[7] is set to 0

Parallel prefix computation

Page 15: Parallel Algorithms

Page 15

• We consider the case d = 2 and i = 0

temp := a[0 + 22 - 1] := a[3]

a[0 + 22 - 1] := a[0 + 22+1 - 1] or, a[3] := a[7]

a[0 + 22+1 - 1] := temp + a[0 + 22+1 - 1] or,

a[7] := a[3] + a[7]

Parallel prefix computation

Page 16: Parallel Algorithms

Page 16

• blue: no change from last iteration.

• magenta: left child.

• brown: right child.

Parallel prefix computation

Page 17: Parallel Algorithms

Page 17

• All the prefix sums except the last one are now

in the leaves of the tree from left to right.

• The prefix sums have to be shifted one position

to the left. Also, the last prefix sum (the sum of

all the elements) should be inserted at the last

leaf.

• The complexity is O (log n) time and O (n)

processors.

Exercise: Reduce the processor complexity to

O (n / log n).

Parallel prefix computation

Page 18: Parallel Algorithms

Page 18

Parallel merging through

partitioning

The partitioning strategy consists of:

• Breaking up the given problem into many

independent subproblems of equal size

• Solving the subproblems in parallel

This is similar to the divide-and-conquer

strategy in sequential computing.

Page 19: Parallel Algorithms

Page 19

Partitioning and Merging

Given a set S with a relation , S is linearly

ordered, if for every pair a,b S.

• either a b or b a.

The merging problem is the following:

Page 20: Parallel Algorithms

Page 20

Partitioning and Merging

Input: Two sorted arrays A = (a1, a2,..., am) and

B = (b1, b2,..., bn) whose elements are drawn

from a linearly ordered set.

Output: A merged sorted sequence

C = (c1, c2,..., cm+n).

Page 21: Parallel Algorithms

Page 21

Merging

For example, if A = (2,8,11,13,17,20) and B =

(3,6,10,15,16,73), the merged sequence

C = (2,3,6,8,10,11,13,15,16,17,20,73).

Page 22: Parallel Algorithms

Page 22

Merging

A sequential algorithm

• Simultaneously move two pointers along the

two arrays

• Write the items in sorted order in another

array

Page 23: Parallel Algorithms

Page 23

Partitioning and Merging

• The complexity of the sequential algorithm is O(m + n).

• We will use the partitioning strategy for

solving this problem in parallel.

Page 24: Parallel Algorithms

Page 24

Partitioning and Merging

Definitions:

rank(ai : A) is the number of elements in A less

than or equal to ai A.

rank(bi : A) is the number of elements in A less

than or equal to bi B.

Page 25: Parallel Algorithms

Page 25

Merging

For example, consider the arrays:

A = (2,8,11,13,17,20)

B = (3,6,10,15,16,73)

rank(11 : A) = 3 and rank(11 : B) = 3.

Page 26: Parallel Algorithms

Page 26

Merging

• The position of an element ai A in the sorted

array C is:

rank(ai : A) + rank(ai : B).

For example, the position of 11 in the sorted array C is:

rank(11 : A) + rank(11 : B) = 3 + 3 = 6.

Page 27: Parallel Algorithms

Page 27

Parallel Merging

• The idea is to decompose the overall merging

problem into many smaller merging

problems.

• When the problem size is sufficiently small,

we will use the sequential algorithm.

Page 28: Parallel Algorithms

Page 28

Merging

• The main task is to generate smaller merging

problems such that:

• Each sequence in such a smaller problem has O(log m) or O(log n) elements.

• Then we can use the sequential algorithm since the time complexity will be O(log m + log n).

Page 29: Parallel Algorithms

Page 29

Parallel Merging

Step 1. Divide the array B into blocks such that each block has log m elements. Hence there are m/log mblocks.

For each block, the last elements are

i log m, 1 i m/log m

Page 30: Parallel Algorithms

Page 30

Parallel Merging

Step 2. We allocate one processor for each last element in B.

•For a last element i log m, this processor does

a binary search in the array A to determine two

elements ak, ak+1 such that ak i log m ak+1.

•All the m/log m binary searches are done in

parallel and take O(log m) time each.

Page 31: Parallel Algorithms

Page 31

Parallel Merging

• After the binary searches are over, the array A is divided into m/log m blocks.

• There is a one-to-one correspondence between the blocks in A and B. We call a pair

of such blocks as matching blocks.

Page 32: Parallel Algorithms

Page 32

Parallel Merging

• Each block in A is determined in the following

way.

• Consider the two elements i log m and(i + 1)

log m. These are the elements in the (i + 1)-th

block of B.

• The two elements that determine rank(i log m: A) and rank((i + 1) log m : A) define the

matching block in A

Page 33: Parallel Algorithms

Page 33

Parallel Merging

• These two matching blocks determine a smaller merging problem.

• Every element inside a matching block has to be ranked inside the other matching block.

• Hence, the problem of merging a pair of matching blocks is an independent subproblem which does not affect any other block.

Page 34: Parallel Algorithms

Page 34

Parallel Merging

• If the size of each block in A is O(log m), we can directly run the sequential algorithm on every pair of matching blocks from A and B.

• Some blocks in A may be larger than O(log m) and hence we have to do some more work to break them into smaller blocks.

Page 35: Parallel Algorithms

Page 35

Parallel Merging

If a block in Ai is larger than O(log m) and the

matching block of Ai is Bj, we do the following

•We divide Ai into blocks of size O(log m).

•Then we apply the same algorithm to rank the boundary elements of each block in Ai in Bj.

•Now each block in A is of size O(log m)

•This takes O(log log m) time.

Page 36: Parallel Algorithms

Page 36

Parallel Merging

Step 3.

• We now take every pair of matching blocks from Aand B and run the sequential merging algorithm.

• One processor is allocated for every matching pair and this processor merges the pair in O(log m)time.

We have to analyse the time and processor complexities of each of the steps to get the overall complexities.

Page 37: Parallel Algorithms

Page 37

Parallel Merging

Complexity of Step 1

• The task in Step 1 is to partition B into blocks of size log m.

• We allocate m/log m processors.

• Since B is an array, processor Pi, 1 i m/log

m can find the element i log m in O(1) time.

Page 38: Parallel Algorithms

Page 38

Parallel Merging

Complexity of Step 2

• In Step 2, m/log m processors do binary

search in array A in O(log n) time each.

• Hence the time complexity is O(log n) and

the work done is

(m log n)/ log m (m log(m + n)) / log m (m + n)

for n,m 4. Hence the total work is O(m + n).

Page 39: Parallel Algorithms

Page 39

Parallel Merging

Complexity of Step 3

• In Step 3, we use m/log m processors

• Each processor merges a pair Ai, Bi in O(log m)

time.Hence the total work done is m.

Theorem

Let A and B be two sorted sequences each of

length n. A and B can be merged in O(log n) time

using O(n) operations in the CREW PRAM.