34
Chapter 11 Broadcasting with Selective Reduction -BSR- Serpil Tokdemir GSU, Department of Computer Science

Chapter 11 B roadcasting with S elective R eduction -BSR-

  • Upload
    oki

  • View
    36

  • Download
    4

Embed Size (px)

DESCRIPTION

Chapter 11 B roadcasting with S elective R eduction -BSR-. Serpil Tokdemir GSU, Department of Computer Science. What is Broadcasting with Selective Reduction?. BSR requires asymptotically no more resources than the PRAM for its implementation. an extension of the PRAM It consists; - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

Chapter 11Broadcasting with Selective

Reduction-BSR-

Serpil TokdemirGSU, Department of Computer Science

Page 2: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

What is Broadcasting with Selective Reduction?

BSR requires asymptotically no more resources than the PRAM for its implementation.

an extension of the PRAM It consists;

N processors M shared-memory locations MAU (memory access unit)

Forms of memory access; ER EW CR CW

Page 3: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

The BSR Model of Parallel Computation

.

.

.

P1

P2

PN

MEMORY

ACCESS

UNIT

(MAU)

MEMORY LOCATIONS

.

.

.

.

.

.

PROCESSORS SHARED MEMORY

Page 4: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

Broadcasting with Selective Reduction

During execution of an algorithm; several processors may read from or write to the same

memory location all processors may gain access to all memory locations at

the same time for the purpose of writing, at each memory location, a subset of the incoming

broadcast data is selected and reduced to one value. according to an appropriate selection and reduction

operator this value is finally stored in the memory location,

BSR accommodates; all forms of memory access allowed by the PRAM +

broadcasting with selective reduction.

Page 5: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Continued

the width of the resulting MAU: O(M) the depth of the resulting MAU: O(logM) the size of the resulting MAU: O(MlogM)

How Long Does a Step Take in BSR? Memory access should require a(N, M)=O(logM)

We assume here that a(N, M)=O(1)

Similarly, a computational operation takes constant time;

c(N, M)=O(1)

Page 6: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

THE BSR MODEL

Additional form of concurrent access to shared memory

BROADCAST – allows all processors to write all-shared memory locations simultaneously.

3 phases, A broadcasting phase,

Each processor Pi broadcasts a datum di and a tag gi, 1<=i<=N, destined to all memory locations.

A selection phase, Each memory location Uj uses a limit lj, 1<=j<=M, and a

selection rule to test the condition gi lj. is selected from the set;

<, <=, =, >=, >,

Page 7: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

The BSR Model (Continued)

A reduction phase, All data di selected by Uj during the selection phase are

combined into one datum that is finally stored in Uj. Reduction operator –

SUM, PRODUCT, AND, OR, EXCLUSIVE-OR, MAXIMUM, MINIMUM

All three phases are performed simultaneously for all processors Pi and all memory locations Uj.

Page 8: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

The three phases of the BROADCAST instruction

g1, d1

g1, d1

g1, d1

gN, dN

gN, dN

gN, dN

g1 l1g2 l1

gN l1

gN lM

g2 lM

g1 lM

dN

dN

Page 9: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

The BSR Model

If a datum or a tag is not in a processor’s local register, obtain it from the shared memory by an ER or a CR

The limits, selection rule and reduction operator, are assumed to be known by the memory locations.

If not, they can be stored in memory by ER or CW

Notation for the BROADCAST Instruction: A

instruction Broadcast of BSR is written as follows:

a1

1 i ji N

j ig l

j M

U d

Page 10: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

THE BSR MODEL

If no data are accepted by a given memory location,

Value is not affected by BROADCAST instruction If only one datum is accepted,

Uj is assigned the value of that datum.

Comparing BSR to the PRAM In BSR, the BROADCAST instruction requires O(1)

time. On a PRAM-same # of p’s and U’s- require O(M) time, since

Broadcast is equivalent to M CW instructions The latter is at least as powerful as the former

The BROADCAST instruction makes BSR strictly more powerful than the PRAM

Page 11: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

THE BSR MODEL

A , in nondecreasing order distinct numbers , in increasing order

It is required to compute, for , the sum si of all those elements of X not equal to .

On the PRAM – O(n) – obviously optimal The sum S of all the elements of X is first computed, Y=X is merged with L, sorted by increasing order, Y is scanned, , , is computed by subtracting

from S all the elements of X equal to . n processors can compute one of the in O(1)

time

1 2, , , nX x x x

1 2, , , nL l l l 1 i n il

is 1 i n il

is

Page 12: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

THE BSR MODEL

BSR using one BROADCAST instruction: Processor Pi, , broadcasts as the tag and

datum pair. Memory location Uj selects those xi not equal to , Those xi selected by Uj are added up to obtain ,

This requires O(1) time Does not depend on X and L being sorted

1 i n ,i ix x

il 1 j n

js 1 j n

Page 13: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR ALGORITHMS

Prefix Sums Given n numbers , prefix sums BSR PREFIX SUMS – n processors and n memory

locations Pi broadcast index as tag and as datum. Memory location uses its index j as limit. Relation for selection and as a reduction

operator. holds

1 2, , , nx x x

1 2 ,1j js x x x j n

i ix

jU

jU ,1js j n

Page 14: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms – Prefix Sums

Algorithm BSR PREFIX SUMS

Consists of one BROADCAST instruction P(n)=n, t(n)=O(1), and c(n)=p(n)*t(n)=O(n) optimal

for j= 1 to n do in parallel

for i= 1 to n do in parallel

end for

end for.

j ii j

s x

Page 15: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms – Prefix Sums Example: n={1, 2, 3}

Page 16: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms – Sorting

A , rearrange the elements of X bbbbbbbbbb – in nondecreasing order

Requires n processors and n memory locations Consists of two steps;

The rank rj of each element xj is computed xj – Limit < - Relation - Reduction operator Uj holds rj , for

xj is placed in position of the sorted sequence S.

If and are equal,

1 2, , , nX x x x 1 2, , , nS s s s

1 j n

1 jr,j kx x mx

Page 17: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms - Sorting

Second step continued ,

to position to position to position The next element with the next higher rank is

placed in position of S. Pi broadcasts the pair (ri, xi) Uj uses its index j as limit for selection as a reduction When this step terminates;

Uj holds sj – that is, the jth element of the sorted sequence

j k mr r r jx 1 jrkx 2 jrmx 3 jr

4 jr

Page 18: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms - Sorting Algorithm BSR SORT

Step 1: for j= 1 to n do in parallel for i= 1 to n do in parallel

Step 2: for j= 1 to n do in parallel for i= 1 to n do parallel

0jr

1i j

jx x

r

end for

end for

1j jr r

i

j ir j

s x

end for

end for

Page 19: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms - Sorting

Example: Processors broadcast

the pairs to all memory locations;

(8,1), (5,1), (2,1), (5,1)

Limits are 8, 5, 2, and 5

Since 5 < 8, 2 < 5, and 5<

8, r1=3 Only 2 < 5, so r2=1 r3=0 Only 2 < 5, so r4=1

8,5,2,5X

Page 20: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms - Sorting

Example continued; Step 2 of the algorithm

Processors broadcast the pairs;

(4,8), (2,5), (1,2), (2,5)

Limits at the memory locations

1, 2, 3, 4 This gives the sorted

sequence; {2, 5, 5, 8}

Page 21: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms - Sorting

Analysis: BSR SORT

p(n)=n and runs in t(n)=O(1) time, c(n)=O(n) Uniform analysis

assumed; the time required for memory access, was taken to be O(1).

Discriminating Analysis: , is taken to be equal to O(logM) – for

BSR & PRAM BSR: N=M=O(n), thus time is O(logn)

Each step is executed once and containing a constant number of computations and memory access, so;

,a N M

Page 22: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms - Sorting

- OPTIMAL

PRAM SORT: N=M=O(n), thus time is O(logn) executes O(logn) computational and memory

access steps, therefore, Cost is NOT optimal

1 , , loga ct n N M N M n

log logc n n n n n

2log , , loga ct n n N M N M n

2 2log logc n p n t n n n n n

Page 23: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms – Computing Maximal Points

, , n points in the plane

, for A point of S is said to be maximal with respect to S if

and only if it is not dominated by any other point of S. uses n processors and n memory locations

consists of three steps: auxiliary sequence is created,

mi, associated with point qi, is set initially to equal yi,

The largest y coordinate is found, mj is assigned the value of that coordinate

Pi broadcasts , xi = tag, yi = datum

,i i iq x y ,j j jq x y

1 2, , , nS q q q

,i i iq x y 1 i n

1 2, , , nm m m1 i n

,i ix y

Page 24: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms – Computing Maximal Points

Uj uses as its limit The relation > for selection for reduction, to compute mj

If , , it accepts the y-coordinate of every point

assigns the max of these to mj.

A decision is made as to whether qi is a maximal point If mi was assigned to some point qk

If , then qk dominates qi

, Else , neither qk nor any other point does not

dominate ,

ix

a jq qx x 1 a n

k iq qy y

0im

k iq qy y

1im

Page 25: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms – Computing Maximal Points

Algorithm BSR MAXIMAL POINTS

i j

j ix x

m y

Step 1: for i= 1 to n do in parallel

end for

Step 2: for j= 1 to n do in parallel

for i= 1 to n do in parallel

end for

end for

j im yStep 3: for i= 1 to n do in parallel

if

then

else

end if

end for.

i im y0im 1im

Page 26: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms – Computing Maximal Points

Analysis; Each step – uses n processors & runs in O(1) time

P(n)=n, t(n)=O(1), and c(n)=O(n) By taking memory access time O(logn), cost becomes

O(nlogn) On the other hand cost for PRAM is O(nlog2n) – not

optimal

Example: are three points in the plane

1 2 3, ,q q q

1q

2q3q

x

y

Page 27: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms – Computing Maximal Points

After step 1 of the algorithm, m1=y1, m2=y2, m3=y3

After step 2, m1=y3, m2=y3, m3=y3

Since, m1<y1, m2>y2 and m3=y3, both q1 and q3 are maximal

Page 28: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms – Maximum Sum Sebsequence

, the subsequence has the largest

possible sum among all subsequences of X.

Algorithm BSR MAXIMUM SUM SUBSEQUENCE Step 1: for j=1 to n do in parallel

for i= 1 to n do in parallel

end for end for

Step 2

1 2, , , nX x x x

u v 1, , ,u u vx x x

1u u vx x x

j ii j

s x

Page 29: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms – Maximum Sum Subsequence

Step 2: (2.1) for j= 1 to n do in parallel for i= 1 to n do in parallel

end for end for (2.2) for j= 1 to n do in parallel for i= 1 to n do in parallel

end for end for

j ji j

m s

i j

js m

a i

Page 30: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms – Maximum Sum Subsequences

Step 3: for i= 1 to n do in parallel

end for Step 4:

(4.1) for i= 1 to n do in parallel

(i) L bi (ii) if bi=L

then u i end if end for

(4.2)

i i i ib m s x

MAX

ARBITRARY

uv a

Page 31: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms – Maximum Sum Subsequences

Steps of algorithm; Prefix sums are computed – uses BSR PREFIX SUMS For each j;

Max prefix sum to he right of sj is found. Value and index mj, aj

(i, si) = tag and datum Uj uses j as limit, >= for selection and for reduction.

To compute ai

Pi broadcasts (si, i) as its tag and datum pair, Uj uses mj as limit, = for selection and for reduction.

For each i, the sum of max sum subsequence is computed

Uses EW instruction

Page 32: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms – Maximum Sum Subsequences

Steps of algorithm continued The sum and starting index u of the overall

maximum sum subsequence are found. Requires MAX CW instruction and an ARBITRARY CW

instruction,

Analysis: Each step of algorithm runs in O(1) time and uses n processors. Thus;

p(n)=n,

t(n)=O(1)

and c(n)=O(n),

Optimal

Page 33: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms – Maximum Sum Subsequences

Example: X={-1, 1, 2, -2}

After step 1, prefix sums - sj

-1, 0, 2, 0

Second broadcast instruction;

mj 2, 2, 2, 0

Page 34: Chapter 11 B roadcasting with  S elective  R eduction -BSR-

BSR Algorithms – Maximum Sum Subsequences

Example continued

Third broadcast instruction for computing aj

aj 3, 3, 3, 4

Step 3 computes each bi

bi 2, 3, 2, -2

Finally; L=3 u=2 v= a2=3