Upload
dr-sandeep-kumar-poonia
View
130
Download
3
Tags:
Embed Size (px)
Citation preview
Algorithms
Parallel Algorithms
1
Page 2
An overview
• A simple parallel algorithm for computing
parallel prefix.
• A parallel merging algorithm
Page 3
• We are given an ordered set A of nelements
and a binary associative operator .
• We have to compute the ordered set
0 1 2 1, , ,..., nA a a a a
0 0 1 0 1 1, ,..., ... na a a a a a
Definition of prefix computation
Page 4
• For example, if is + and the input is the
ordered set
{5, 3, -6, 2, 7, 10, -2, 8}
then the output is
{5, 8, 2, 4, 11, 21, 19, 27}
• Prefix sum can be computed in O (n) time
sequentially.
An example of prefix computation
Page 5
First Pass
• For every internal node of the tree, compute
the sum of all the leaves in its subtree in a
bottom-up fashion.
sum[v] := sum[L[v]] + sum[R[v]]
Using a binary tree
Page 6
for d = 0 to log n – 1 do
for i = 0 to n – 1 by 2d+1 do in parallel
a[i + 2d+1 - 1] := a[i + 2d - 1] + a[i + 2d+1 - 1]
• In our example, n = 8, hence the outer loop
iterates 3 times, d = 0, 1, 2.
Parallel prefix computation
Page 7
• d = 0: In this case, the increments of 2d+1 will be
in terms of 2 elements.
• for i = 0,
a[0 + 20+1 - 1] := a[0 + 20 - 1] + a[0 + 20+1 - 1]
or, a[1] := a[0] + a[1]
When d= 0
Page 8
• d = 1: In this case, the increments of 2d+1 will be
in terms of 4 elements.
• for i = 0,
a[0 + 21+1 - 1] := a[0 + 21 - 1] + a[0 + 21+1 - 1]
or, a[3] := a[1] + a[3]
• for i = 4,
a[4 + 21+1 - 1] := a[4 + 21 - 1] + a[4 + 21+1 - 1]
or, a[7] := a[5] + a[7]
When d = 1
Page 9
• blue: no change from last iteration.
• magenta: changed in the current iteration.
The First Pass
Page 10
Second Pass
• The idea in the second pass is to do a
topdown computation to generate all the prefix
sums.
• We use the notation pre[v] to denote the prefix
sum at every node.
The Second Pass
Page 11
• pre[root] := 0, the identity element for the
operation, since we are considering the
operation.
• If the operation is max, the identity element will
be .
Computation in the second phase
Page 12
pre[L[v]] := pre[v]
pre[R[v]] := sum[L[v]] + pre[v]
Second phase (continued)
Page 13
Example of second phase
pre[L[v]] := pre[v]
pre[R[v]] := sum[L[v]] + pre[v]
Page 14
for d = (log n – 1) downto 0 do
for i = 0 to n – 1 by 2d+1 do in parallel
temp := a[i + 2d - 1]
a[i + 2d - 1] := a[i + 2d+1 - 1] (left child)
a[i + 2d+1 - 1] := temp + a[i + 2d+1 - 1] (right
child)
a[7] is set to 0
Parallel prefix computation
Page 15
• We consider the case d = 2 and i = 0
temp := a[0 + 22 - 1] := a[3]
a[0 + 22 - 1] := a[0 + 22+1 - 1] or, a[3] := a[7]
a[0 + 22+1 - 1] := temp + a[0 + 22+1 - 1] or,
a[7] := a[3] + a[7]
Parallel prefix computation
Page 16
• blue: no change from last iteration.
• magenta: left child.
• brown: right child.
Parallel prefix computation
Page 17
• All the prefix sums except the last one are now
in the leaves of the tree from left to right.
• The prefix sums have to be shifted one position
to the left. Also, the last prefix sum (the sum of
all the elements) should be inserted at the last
leaf.
• The complexity is O (log n) time and O (n)
processors.
Exercise: Reduce the processor complexity to
O (n / log n).
Parallel prefix computation
Page 18
Parallel merging through
partitioning
The partitioning strategy consists of:
• Breaking up the given problem into many
independent subproblems of equal size
• Solving the subproblems in parallel
This is similar to the divide-and-conquer
strategy in sequential computing.
Page 19
Partitioning and Merging
Given a set S with a relation , S is linearly
ordered, if for every pair a,b S.
• either a b or b a.
The merging problem is the following:
Page 20
Partitioning and Merging
Input: Two sorted arrays A = (a1, a2,..., am) and
B = (b1, b2,..., bn) whose elements are drawn
from a linearly ordered set.
Output: A merged sorted sequence
C = (c1, c2,..., cm+n).
Page 21
Merging
For example, if A = (2,8,11,13,17,20) and B =
(3,6,10,15,16,73), the merged sequence
C = (2,3,6,8,10,11,13,15,16,17,20,73).
Page 22
Merging
A sequential algorithm
• Simultaneously move two pointers along the
two arrays
• Write the items in sorted order in another
array
Page 23
Partitioning and Merging
• The complexity of the sequential algorithm is O(m + n).
• We will use the partitioning strategy for
solving this problem in parallel.
Page 24
Partitioning and Merging
Definitions:
rank(ai : A) is the number of elements in A less
than or equal to ai A.
rank(bi : A) is the number of elements in A less
than or equal to bi B.
Page 25
Merging
For example, consider the arrays:
A = (2,8,11,13,17,20)
B = (3,6,10,15,16,73)
rank(11 : A) = 3 and rank(11 : B) = 3.
Page 26
Merging
• The position of an element ai A in the sorted
array C is:
rank(ai : A) + rank(ai : B).
For example, the position of 11 in the sorted array C is:
rank(11 : A) + rank(11 : B) = 3 + 3 = 6.
Page 27
Parallel Merging
• The idea is to decompose the overall merging
problem into many smaller merging
problems.
• When the problem size is sufficiently small,
we will use the sequential algorithm.
Page 28
Merging
• The main task is to generate smaller merging
problems such that:
• Each sequence in such a smaller problem has O(log m) or O(log n) elements.
• Then we can use the sequential algorithm since the time complexity will be O(log m + log n).
Page 29
Parallel Merging
Step 1. Divide the array B into blocks such that each block has log m elements. Hence there are m/log mblocks.
For each block, the last elements are
i log m, 1 i m/log m
Page 30
Parallel Merging
Step 2. We allocate one processor for each last element in B.
•For a last element i log m, this processor does
a binary search in the array A to determine two
elements ak, ak+1 such that ak i log m ak+1.
•All the m/log m binary searches are done in
parallel and take O(log m) time each.
Page 31
Parallel Merging
• After the binary searches are over, the array A is divided into m/log m blocks.
• There is a one-to-one correspondence between the blocks in A and B. We call a pair
of such blocks as matching blocks.
Page 32
Parallel Merging
• Each block in A is determined in the following
way.
• Consider the two elements i log m and(i + 1)
log m. These are the elements in the (i + 1)-th
block of B.
• The two elements that determine rank(i log m: A) and rank((i + 1) log m : A) define the
matching block in A
Page 33
Parallel Merging
• These two matching blocks determine a smaller merging problem.
• Every element inside a matching block has to be ranked inside the other matching block.
• Hence, the problem of merging a pair of matching blocks is an independent subproblem which does not affect any other block.
Page 34
Parallel Merging
• If the size of each block in A is O(log m), we can directly run the sequential algorithm on every pair of matching blocks from A and B.
• Some blocks in A may be larger than O(log m) and hence we have to do some more work to break them into smaller blocks.
Page 35
Parallel Merging
If a block in Ai is larger than O(log m) and the
matching block of Ai is Bj, we do the following
•We divide Ai into blocks of size O(log m).
•Then we apply the same algorithm to rank the boundary elements of each block in Ai in Bj.
•Now each block in A is of size O(log m)
•This takes O(log log m) time.
Page 36
Parallel Merging
Step 3.
• We now take every pair of matching blocks from Aand B and run the sequential merging algorithm.
• One processor is allocated for every matching pair and this processor merges the pair in O(log m)time.
We have to analyse the time and processor complexities of each of the steps to get the overall complexities.
Page 37
Parallel Merging
Complexity of Step 1
• The task in Step 1 is to partition B into blocks of size log m.
• We allocate m/log m processors.
• Since B is an array, processor Pi, 1 i m/log
m can find the element i log m in O(1) time.
Page 38
Parallel Merging
Complexity of Step 2
• In Step 2, m/log m processors do binary
search in array A in O(log n) time each.
• Hence the time complexity is O(log n) and
the work done is
(m log n)/ log m (m log(m + n)) / log m (m + n)
for n,m 4. Hence the total work is O(m + n).
Page 39
Parallel Merging
Complexity of Step 3
• In Step 3, we use m/log m processors
• Each processor merges a pair Ai, Bi in O(log m)
time.Hence the total work done is m.
Theorem
Let A and B be two sorted sequences each of
length n. A and B can be merged in O(log n) time
using O(n) operations in the CREW PRAM.