268
1 Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers Barry Wilkinson and Michael Allen Prentice Hall, 1998 Figure 1.1 Astrophysical N-body simulation by Scott Linssen (undergraduate University of North Carolina at Charlotte [UNCC] student).

Parallel Programming - Slides

  • Upload
    phobos

  • View
    62

  • Download
    1

Embed Size (px)

DESCRIPTION

Parallel Programming - Slides

Citation preview

Page 1: Parallel Programming - Slides

1Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.1 Astrophysical N-bodysimulation by Scott Linssen (undergraduateUniversity of North Carolina at Charlotte[UNCC] student).

Page 2: Parallel Programming - Slides

2Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.2 Conventional computer havinga single processor and memory.

Main memory

Processor

Instructions (to processor)Data (to or from processor)

Page 3: Parallel Programming - Slides

3Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.3 Traditional shared memorymultiprocessor model.Processors

Interconnectionnetwork

Memory modulesOneaddressspace

Page 4: Parallel Programming - Slides

4Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Processor

Interconnectionnetwork

Local

Computers

Messages

Figure 1.4 Message-passingmultiprocessor model (multicomputer).

memory

Page 5: Parallel Programming - Slides

5Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Processor

Interconnectionnetwork

Shared

Computers

Messages

Figure 1.5 Shared memory multiprocessorimplementation.

memory

Page 6: Parallel Programming - Slides

6Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.6 MPMD structure.

Program

Processor

Data

Program

Processor

Data

InstructionsInstructions

Page 7: Parallel Programming - Slides

7Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P M

C

P M

C

P M

C

Figure 1.7 Static link multicomputer.

Computers

Network with direct linksbetween computers

Page 8: Parallel Programming - Slides

8Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Linksto other

nodes

Switch

Processor Memory

Computer (node)

Linksto othernodes

Figure 1.8 Node with a switch for internode message transfers.

Page 9: Parallel Programming - Slides

9Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Link

Figure 1.9 A link between two nodes withseparate wires in each direction.

NodeNode

Page 10: Parallel Programming - Slides

10Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.10 Ring.

Page 11: Parallel Programming - Slides

11Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.11 Two-dimensional array(mesh).

LinksComputer/processor

Page 12: Parallel Programming - Slides

12Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.12 Tree structure.

Processingelement

Root

Links

Page 13: Parallel Programming - Slides

13Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.13 Three-dimensional hypercube.000 001

010 011

100

110

101

111

Page 14: Parallel Programming - Slides

14Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

0000 0001

0010 0011

0100

0110

0101

0111

1000 1001

1010 1011

1100

1110

1101

1111

Figure 1.14 Four-dimensional hypercube.

Page 15: Parallel Programming - Slides

15Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.15 Embedding a ring onto a torus.

Ring

Page 16: Parallel Programming - Slides

16Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.16 Embedding a mesh into ahypercube.

00

01

11

10

00 01 11 10yx

Nodal address1011

Page 17: Parallel Programming - Slides

17Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.17 Embedding a tree into a mesh.

Root

A

A

A

A

A

A

Page 18: Parallel Programming - Slides

18Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

HeadPacket

Request/Acknowledge

signal(s)

Figure 1.18 Distribution of flits.

Flit buffer

Movement

Page 19: Parallel Programming - Slides

19Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Data

R/A

Source Destinationprocessor processor

Figure 1.19 A signaling method betweenprocessors for wormhole routing (Ni andMcKinley, 1993).

Page 20: Parallel Programming - Slides

20Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Packet switching

Circuit switchingWormhole routing

Distance

Network

(number of nodes between source and destination)

latency

Figure 1.20 Network delay characteristics.

Page 21: Parallel Programming - Slides

21Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Messages

Node 1 Node 2

Node 3Node 4

Figure 1.21 Deadlock in store-and-forwardnetworks.

Page 22: Parallel Programming - Slides

22Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Physical link

Virtual channel

Route

buffer Node Node

Figure 1.22 Multiple virtual channels mapped onto a single physical channel.

Page 23: Parallel Programming - Slides

23Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Workstations Figure 1.23 Ethernet-type single wirenetwork.

Workstation/

Ethernet

file server

Page 24: Parallel Programming - Slides

24Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.24 Ethernet frame format.

Preamble

(64 bits)

Destinationaddress(48 bits)

Sourceaddress(48 bits)

Type

(16 bits)

Data

(variable)

Frame checksequence(32 bits)

Direction

Page 25: Parallel Programming - Slides

25Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Network

Workstation/

Workstations

Figure 1.25 Network of workstations connected via a ring.

file server

Page 26: Parallel Programming - Slides

26Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Workstation/file server

Workstations

Figure 1.26 Star connected network.

Page 27: Parallel Programming - Slides

27Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.27 Overlapping connectivity Ethernets.

(a) Using specially designed adaptors

(b) Using separate Ethernet interfaces

Parallel programming cluster

Page 28: Parallel Programming - Slides

28Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Time

Process 1

Process 2

Process 3

Process 4

Waiting to send a message

Figure 1.28 Space-time diagram of a message-passing program.

Message

Computing

Slope indicating timeto send message

Page 29: Parallel Programming - Slides

29Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Serial section Parallelizable sections

(a) One processor

(b) Multipleprocessors

fts (1 − f)ts

ts

(1 − f)ts/n

Figure 1.29 Parallelizing sequential problem — Amdahl’s law.

tp

n processors

Page 30: Parallel Programming - Slides

30Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 1.30 (a) Speedup against number of processors. (b) Speedup against serial fraction, f.

4

8

12

16

20

0.2 0.4 0.6 0.8 1.0

Spee

dup

fact

or,S

(n)

Serial fraction, f

(b)

n = 256

n = 164

8

12

16

20

4 8 12 16 20

f = 20%

f = 10%

f = 5%

f = 0%

Spee

dup

fact

or,S

(n)

Number of processors, n

(a)

Page 31: Parallel Programming - Slides

31Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Sourcefile

Executables

Processor 0 Processor n − 1Figure 2.1 Single program, multiple dataoperation.

Compile to suitprocessor

Page 32: Parallel Programming - Slides

32Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Process 1

Process 2spawn();

Figure 2.2 Spawning a process.

Time

Start executionof process 2

Page 33: Parallel Programming - Slides

33Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.3 Passing a message betweenprocesses using send() and recv()library calls.

Process 1 Process 2

send(&x, 2);

recv(&y, 1);

x y

Movementof data

Page 34: Parallel Programming - Slides

34Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.4 Synchronous send() and recv() library calls using a three-way protocol.

Process 1 Process 2

send();

recv();Suspend

Time

processAcknowledgment

MessageBoth processescontinue

(a) When send() occurs before recv()

Process 1 Process 2

recv();

send();Suspend

Time

process

Acknowledgment

MessageBoth processescontinue

(b) When recv() occurs before send()

Request to send

Request to send

Page 35: Parallel Programming - Slides

35Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.5 Using a message buffer.

Process 1 Process 2

send();

recv();

Message buffer

Readmessage buffer

Continueprocess

Time

Page 36: Parallel Programming - Slides

36Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

bcast();

buf

bcast();

data

bcast();

datadata

Figure 2.6 Broadcast operation.

Process 0 Process n − 1Process 1

Action

Code

Page 37: Parallel Programming - Slides

37Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

scatter();

buf

scatter();

data

scatter();

datadata

Figure 2.7 Scatter operation.

Process 0 Process n − 1Process 1

Action

Code

Page 38: Parallel Programming - Slides

38Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.8 Gather operation.

gather();

buf

gather();

data

gather();

datadata

Process 0 Process n − 1Process 1

Action

Code

Page 39: Parallel Programming - Slides

39Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.9 Reduce operation (addition).

reduce();

buf

reduce();

data

reduce();

datadata

Process 0 Process n − 1Process 1

+

Action

Code

Page 40: Parallel Programming - Slides

40Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.10 Message passing between workstations using PVM.

PVM

Application

daemon

program

Workstation

PVMdaemon

Applicationprogram

Applicationprogram

PVMdaemon

Workstation

Workstation

Messagessent throughnetwork

(executable)

(executable)

(executable)

Page 41: Parallel Programming - Slides

41Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.11 Multiple processes allocated to each processor (workstation).

Workstation

Applicationprogram

PVMdaemon Workstation

Workstation

Messagessent throughnetwork

(executable)

PVMdaemon

PVMdaemon

Page 42: Parallel Programming - Slides

42Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.12 pvm_psend() and pvm_precv() system calls.

Process 1 Process 2

pvm_psend();

pvm_precv();Continueprocess

Wait for message

Pack

Send bufferArray Array toholdingdata

receivedata

Page 43: Parallel Programming - Slides

43Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

pvm_pkint( … &x …);pvm_pkstr( … &s …);pvm_pkfloat( … &y …);pvm_send(process_2 … ); pvm_recv(process_1 …);

pvm_upkint( … &x …);pvm_upkstr( … &s …);pvm_upkfloat(… &y … );

Send

Receivebuffer

buffer

xsy

Process_1 Process_2

Figure 2.13 PVM packing messages, sending, and unpacking.

Message

pvm_initsend();

Page 44: Parallel Programming - Slides

44Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.14 Sample PVM program.

#include <stdio.h>#include <stdlib.h>#include <pvm3.h>#define SLAVE “spsum”#define PROC 10#define NELEM 1000main() {

int mytid,tids[PROC];int n = NELEM, nproc = PROC;int no, i, who, msgtype;int data[NELEM],result[PROC],tot=0;char fn[255];FILE *fp;mytid=pvm_mytid();/*Enroll in PVM */

/* Start Slave Tasks */no= pvm_spawn(SLAVE,(char**)0,0,““,nproc,tids);if (no < nproc) {

printf(“Trouble spawning slaves \n”);for (i=0; i<no; i++) pvm_kill(tids[i]);pvm_exit(); exit(1);

}

/* Open Input File and Initialize Data */strcpy(fn,getenv(“HOME”));strcat(fn,”/pvm3/src/rand_data.txt”);if ((fp = fopen(fn,”r”)) == NULL) {

printf(“Can’t open input file %s\n”,fn);exit(1);

}for(i=0;i<n;i++)fscanf(fp,”%d”,&data[i]);

/* Broadcast data To slaves*/pvm_initsend(PvmDataDefault);msgtype = 0;pvm_pkint(&nproc, 1, 1);pvm_pkint(tids, nproc, 1);pvm_pkint(&n, 1, 1);pvm_pkint(data, n, 1);pvm_mcast(tids, nproc, msgtag);

/* Get results from Slaves*/msgtype = 5;for (i=0; i<nproc; i++){

pvm_recv(-1, msgtype);pvm_upkint(&who, 1, 1);pvm_upkint(&result[who], 1, 1);printf(“%d from %d\n”,result[who],who);

}

/* Compute global sum */for (i=0; i<nproc; i++) tot += result[i];printf (“The total is %d.\n\n”, tot);

pvm_exit(); /* Program finished. Exit PVM */ return(0);

#include <stdio.h>#include “pvm3.h”#define PROC 10#define NELEM 1000

main() {int mytid;int tids[PROC];int n, me, i, msgtype;int x, nproc, master;int data[NELEM], sum;

mytid = pvm_mytid();

/* Receive data from master */msgtype = 0;pvm_recv(-1, msgtype);pvm_upkint(&nproc, 1, 1);pvm_upkint(tids, nproc, 1);pvm_upkint(&n, 1, 1);pvm_upkint(data, n, 1);

/* Determine my tid */for (i=0; i<nproc; i++)

if(mytid==tids[i]){me = i;break;}

/* Add my portion Of data */x = n/nproc;low = me * x;high = low + x;for(i = low; i < high; i++)

sum += data[i];

/* Send result to master */pvm_initsend(PvmDataDefault);pvm_pkint(&me, 1, 1);pvm_pkint(&sum, 1, 1);msgtype = 5;master = pvm_parent();pvm_send(master, msgtype);

/* Exit PVM */pvm_exit();return(0);

}

Master

Slave

Broadcast data

Receive results

Page 45: Parallel Programming - Slides

45Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.15 Unsafe message passing with libraries.

lib()

lib()

send(…,1,…);

recv(…,0,…);

Process 0 Process 1

send(…,1,…);

recv(…,0,…);

(a) Intended behavior

(b) Possible behavior

lib()

lib()

send(…,1,…);

recv(…,0,…);

Process 0 Process 1

send(…,1,…);

recv(…,0,…);

Destination

Source

Page 46: Parallel Programming - Slides

46Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.16 Sample MPI program.

#include “mpi.h”#include <stdio.h>#include <math.h>#define MAXSIZE 1000

void main(int argc, char *argv){

int myid, numprocs;int data[MAXSIZE], i, x, low, high, myresult, result;char fn[255];char *fp;

MPI_Init(&argc,&argv);MPI_Comm_size(MPI_COMM_WORLD,&numprocs);MPI_Comm_rank(MPI_COMM_WORLD,&myid);

if (myid == 0) { /* Open input file and initialize data */strcpy(fn,getenv(“HOME”));strcat(fn,”/MPI/rand_data.txt”);if ((fp = fopen(fn,”r”)) == NULL) {

printf(“Can’t open the input file: %s\n\n”, fn);exit(1);

}for(i = 0; i < MAXSIZE; i++) fscanf(fp,”%d”, &data[i]);

}

/* broadcast data */MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD);

/* Add my portion Of data */x = n/nproc;low = myid * x;high = low + x;for(i = low; i < high; i++)

myresult += data[i];printf(“I got %d from %d\n”, myresult, myid);

/* Compute global sum */MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);if (myid == 0) printf(“The sum is %d.\n”, result);

MPI_Finalize();}

Page 47: Parallel Programming - Slides

47Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Tim

e

Number of data items (n)

Startup time

Figure 2.17 Theoretical communicationtime.

Page 48: Parallel Programming - Slides

48Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

160

140

120

100

80

60

40

20

01 2 3 4 50

x0

c2g(x) = 6x2

c1g(x) = 2x2

f(x) = 4x2 + 2x + 12

Figure 2.18 Growth of function f(x) = 4x2 + 2x + 12.

Page 49: Parallel Programming - Slides

49Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.19 Broadcast in a three-dimensional hypercube.

000 001

010 011

100

110

101

111

1st step

2nd step

3rd step

Page 50: Parallel Programming - Slides

50Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.20 Broadcast as a tree construction.

P000

P000

P010P000

P000

P010P100 P110 P001 P101 P011

P001

P111

P001 P011

Step 1

Step 2

Step 3

Message

Page 51: Parallel Programming - Slides

51Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.21 Broadcast in a mesh.

1 2 3

4

5 6

2

3

4

5

6

3 4

5

4

Steps

Page 52: Parallel Programming - Slides

52Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Source Destinations

Message

Figure 2.22 Broadcast on an Ethernetnetwork.

Page 53: Parallel Programming - Slides

53Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 2.23 1-to-N fan-out broadcast.

Source

N destinations

Sequential

Page 54: Parallel Programming - Slides

54Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Source

Sequential message issue

DestinationsFigure 2.24 1-to-N fan-out broadcast on atree structure.

Page 55: Parallel Programming - Slides

55Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Process 1

Process 2

Process 3

Time

Computing

Waiting

Message-passing system routine

Message

Figure 2.25 Space-time diagram of a parallel program.

Page 56: Parallel Programming - Slides

56Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Statement number or regions of program1 2 3 4 5 6 7 8 9 10

Num

ber

of r

epet

ition

s or

tim

e

Figure 2.26 Program profile.

Page 57: Parallel Programming - Slides

57Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Processes

Results

Input data

Figure 3.1 Disconnected computationalgraph (embarrassingly parallel problem).

Page 58: Parallel Programming - Slides

58Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 3.2 Practical embarrassingly parallel computational graph with dynamic processcreation and the master-slave approach.

Send initial data

Collect results

MasterSlaves

spawn()

recv()

send()

recv()send()

Page 59: Parallel Programming - Slides

59Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

640

480

80

80

640

480

10

(a) Square region for each process

(b) Row region for each process

Figure 3.3 Partitioning into regions for individual processes.

Process

Map

Process

Map

x

y

Page 60: Parallel Programming - Slides

60Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Real

Figure 3.4 Mandelbrot set.

+2−2 0

+2

−2

0

Imaginary

Page 61: Parallel Programming - Slides

61Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Work pool

(xc, yc)(xa, ya)

(xd, yd)(xb, yb)

(xe, ye)

Figure 3.5 Work pool approach.

Task

Return results/request new task

Page 62: Parallel Programming - Slides

62Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

0 disp_height

Row returned

Row sent

Increment

Decrement

Rows outstanding in slaves (count)

Figure 3.6 Counter termination.Terminate

Page 63: Parallel Programming - Slides

63Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 3.7 Computing π by a Monte Carlomethod.

Area = π

Total area = 4

2

2

Page 64: Parallel Programming - Slides

64Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x

y 1 x2–=1

f(x)

Figure 3.8 Function being integrated incomputing π by a Monte Carlo method.1

1

Page 65: Parallel Programming - Slides

65Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Master

Slaves

Random numberprocess

Randomnumber

Partial sum

Request

Figure 3.9 Parallel Monte Carlointegration.

Page 66: Parallel Programming - Slides

66Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x1 x2 xk-1 xk xk+1 xk+2 x2k-1 x2k

Figure 3.10 Parallel computation of a sequence.

Page 67: Parallel Programming - Slides

67Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.1 Partitioning a sequence of numbers into parts and adding the parts.

Sum

x0 … x(n/m)−1 xn/m … x(2n/m)−1 x(m−1)n/m … xn−1…

Partial sums

+ +

+

+

Page 68: Parallel Programming - Slides

68Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.2 Tree construction.

Initial problem

Divide

Final tasks

problem

Page 69: Parallel Programming - Slides

69Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.3 Dividing a list into parts.

P0 P1 P2 P3 P4 P5 P6 P7

P0

P0

P0 P2 P4 P6

P4

Original list

x0 xn−1

Page 70: Parallel Programming - Slides

70Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.4 Partial summation.

P0 P1 P2 P3 P4 P5 P6 P7

P0

P0

P0 P2 P4 P6

P4

Final sum

x0 xn−1

Page 71: Parallel Programming - Slides

71Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

OR

OROR

Found/Not found

Figure 4.5 Part of a search tree.

Page 72: Parallel Programming - Slides

72Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.6 Quadtree.

Page 73: Parallel Programming - Slides

73Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Image area

First division

Second division

into four parts

Figure 4.7 Dividing an image.

Page 74: Parallel Programming - Slides

74Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Unsorted numbers

Sorted numbers

Buckets

Figure 4.8 Bucket sort.

Sortcontentsof buckets

Merge lists

Page 75: Parallel Programming - Slides

75Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Unsorted numbers

Sort

Figure 4.9 One parallel version of bucket sort.

Buckets

contentsof buckets

Merge lists

p processors

Sorted numbers

Page 76: Parallel Programming - Slides

76Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Unsorted numbers

Sort

Large

Figure 4.10 Parallel version of bucket sort.

Smallbuckets

Emptysmallbuckets

buckets

contentsof buckets

Merge lists

p processors

n/m numbers

Sorted numbers

Page 77: Parallel Programming - Slides

77Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Send Receive

Send

Process 1 Process n − 1

Process 0 Process n − 1

Process 0 Process n − 2

0 n − 1 0 n − 1 0 n − 1 0 n − 1

Figure 4.11 “All-to-all” broadcast.

buffer buffer

buffer

Page 78: Parallel Programming - Slides

78Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

A0,0 A0,1 A0,2 A0,3

A1,0 A1,1 A1,2 A1,3

A3,0 A3,1 A3,2 A3,3

A2,0 A2,1 A2,2 A2,3

A0,0 A1,0 A2,0 A3,0

A0,1 A1,1 A2,1 A3,1

A0,3 A1,3 A2,3 A3,3

A0,2 A1,2 A2,2 A3,2

P0

P1

P2

P3

“All-to-all”

Figure 4.12 Effect of “all-to-all” on anarray.

Page 79: Parallel Programming - Slides

79Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.13 Numerical integration usingrectangles.

f(q)f(p)

δ

f(x)

xp qa b

Page 80: Parallel Programming - Slides

80Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

f(q)f(p)

δFigure 4.14 More accurate numericalintegration using rectangles.

f(x)

xp qa b

Page 81: Parallel Programming - Slides

81Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.15 Numerical integration usingthe trapezoidal method.

f(q)f(p)

δ

f(x)

xp qa b

Page 82: Parallel Programming - Slides

82Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.16 Adaptive quadratureconstruction.

A B

Cf(x)

x

Page 83: Parallel Programming - Slides

83Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.17 Adaptive quadrature with falsetermination.

f(x)

x

A B

C = 0

Page 84: Parallel Programming - Slides

84Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Distant cluster of bodiesr

Center of mass

Figure 4.18 Clustering distant bodies.

Page 85: Parallel Programming - Slides

85Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Subdivisiondirection

Figure 4.19 Recursive division of two-dimensional space.

Partial quadtreeParticles

Page 86: Parallel Programming - Slides

86Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.20 Orthogonal recursive bisectionmethod.

Page 87: Parallel Programming - Slides

87Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Binary Tree

Result

Figure 4.21 Process diagram for Problem 4-12(b).

log n numbers

Page 88: Parallel Programming - Slides

88Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

f(a)

f(b)

ab

y

x

f(x)

Figure 4.22 Bisection method for findingthe zero crossing location of a function.

Page 89: Parallel Programming - Slides

89Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 4.23 Convex hull (Problem 4-22).

Page 90: Parallel Programming - Slides

90Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2 P3 P4 P5

Figure 5.1 Pipelined processes.

Page 91: Parallel Programming - Slides

91Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

sum

a[0] a[1] a[2] a[3] a[4]

soutsin

Figure 5.2 Pipeline for an unfolded loop.

soutsin soutsin soutsin soutsin

a aaaa

Page 92: Parallel Programming - Slides

92Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

f(t) foutfin

Figure 5.3 Pipeline for a frequency filter.

foutfin foutfin foutfin foutfin

f0 f4f3f2f1 Filtered signal

Signal withoutfrequency f0

Signal withoutfrequency f1

Signal withoutfrequency f2

Signal withoutfrequency f3

Page 93: Parallel Programming - Slides

93Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0

P4

P3

P5

P2

P1Instance

1

Instance1

Instance1

Instance1

Instance1

Instance1

Instance2

Instance2

Instance2

Instance2

Instance2

Instance2

Instance4

Instance3

Instance3

Instance3

Instance3

Instance3

Instance3

Instance4

Instance4

Instance4

Instance4

Instance4

Instance5

Instance5

Instance5

Instance5

Instance5

Instance6

Instance5

Instance6

Instance6

Instance6

Instance6

Instance7

Instance7

Instance7

Instance7

Time

Figure 5.4 Space-time diagram of a pipeline.

p − 1 m

Page 94: Parallel Programming - Slides

94Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2 P3 P4 P5

P0 P1 P2 P3 P4 P5

P0 P1 P2 P3 P4 P5

P0 P1 P2 P3 P4 P5

P0 P1 P2 P3 P4 P5

Time

Instance 1

Instance 2

Instance 3

Instance 0

Instance 4

Figure 5.5 Alternative space-time diagram.

Page 95: Parallel Programming - Slides

95Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0

P4

P3

P5

P2

P1

Time

Figure 5.6 Pipeline processing 10 data elements.

d9d8d7d6d5d4d3d2d1d0 P0 P1 P2 P3 P4 P5

(a) Pipeline structure

(b) Timing diagram

P8

P7

P9

P6

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

P7P6 P8 P9

Input sequence

p − 1 n

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

d0 d1 d2 d3 d4 d5 d6 d7 d8 d9

d0 d1 d2 d3 d4 d5 d6 d7 d8

d0 d1 d2 d3 d4 d5 d6 d7

d0 d1 d2 d3 d4 d5 d6

Page 96: Parallel Programming - Slides

96Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Time

P0

P1

P2

P3

P4

P5

(a) Processes with the same (b) Processes not with the

P0

P1

P2

P3

P4

P5

Time

Figure 5.7 Pipeline processing where information passes to next stage before end of process.

Informationtransfersufficient tostart nextprocess

same execution timeexecution time

Information passedto next stage

Page 97: Parallel Programming - Slides

97Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2 P3 P4 P5 P7P6 P8 P9 P11P10

Processor 1Processor 0 Processor 2

Figure 5.8 Partitioning processes onto processors.

Page 98: Parallel Programming - Slides

98Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Host

Multiprocessor

computer

Figure 5.9 Multiprocessor system with a line configuration.

Page 99: Parallel Programming - Slides

99Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P3P2P1 P4

Figure 5.10 Pipelined addition.

Σ1

5iΣ

1

i Σ1

2i Σ

1

3i Σ

1

4i

Page 100: Parallel Programming - Slides

100Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P2P1 Pn−1

Figure 5.11 Pipelined addition numbers with a master process and ring configuration.

dn−1… d2d1d0

Master process

Sum

Slaves

Page 101: Parallel Programming - Slides

101Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P2P1 Pn−1

Figure 5.12 Pipelined addition of numbers with direct access to slave processes.

Master process

Sum

Slaves dn−1d0 d1

Numbers

Page 102: Parallel Programming - Slides

102Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

5

4, 3, 1, 2, 5

4, 3, 1, 2

4, 3, 12

5

4, 3 5 21

4 53

21

54

32

5 43

5 4

1

21

32

1

5 4 3 21

5 4 3 2 1

Figure 5.13 Steps in insertion sort with five numbers.

P0 P2 P3 P4P1

Time

1

2

3

4

5

6

8

7

(cycles)

9

10

Page 103: Parallel Programming - Slides

103Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2

Largest number Next largestnumber

Series of numbersxn−1 … x1x0

Figure 5.14 Pipeline for sorting using insertion sort.

xmax

Compare

Smallernumbers

Page 104: Parallel Programming - Slides

104Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P2P1 Pn−1

Figure 5.15 Insertion sort with results returned to the master process using a bidirectional line configuration.

dn−1… d2d1d0Sorted sequence

Master process

Page 105: Parallel Programming - Slides

105Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0

P4

P3

P2

P1

Time

Figure 5.16 Insertion sort with results returned.

Sorting phase Returning sorted numbers

2n − 1 n

Shown for n = 5

Page 106: Parallel Programming - Slides

106Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2

1st prime 2nd prime

Series of numbersxn−1 … x1x0

Figure 5.17 Pipeline for sieve of Eratosthenes.

Compare

Not multiples of

3rd primemultiples number number number

1st prime number

Page 107: Parallel Programming - Slides

107Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x0x0

x0 x0

x1x1 x1x2 x2

x3

Figure 5.18 Solving an upper triangular set of linear equation using a pipeline.

Compute x0 Compute x1 Compute x2 Compute x3

P0 P1 P2 P3

Page 108: Parallel Programming - Slides

108Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Time

P0

P1

P2

P3

P4

P5

Figure 5.19 Pipeline processing using backsubstitution.

ProcessesFinal computed value

First value passed onward

Page 109: Parallel Programming - Slides

109Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2 P3 P4

dividesend(x0) ⇒ recv(x0)end send(x0) ⇒ recv(x0)

multiply/add send(x0) ⇒ recv(x0)divide/subtract multiply/add send(x0) ⇒ recv(x0)send(x1) ⇒ recv(x1) multiply/add send(x1) ⇒end send(x1) ⇒ recv(x1) multiply/add

multiply/add send(x1) ⇒ recv(x1)divide/subtract multiply/add send(x1) ⇒send(x2) ⇒ recv(x2) multiply/addend send(x2) ⇒ recv(x2)

multiply/add send(x2) ⇒divide/subtract multiply/addsend(x3) ⇒ recv(x3)end send(x3) ⇒

multiply/adddivide/subtractsend(x4) ⇒end

Figure 5.20 Operations in back substitution pipeline.

Time

Page 110: Parallel Programming - Slides

110Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x1 x2

x

x3 x4

yin yout

a

x

yin yout

a

x

yin yout

a

x

yin yout

a

a1 a2 a3 a4

y4y3y2y1 Output

Figure 5.21 Pipeline for Problem 5-9.

Page 111: Parallel Programming - Slides

111Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Display

Pipeline

Audio input(digitized)

Figure 5.22 Audio histogram display.

Display

Audio input(digitized)

(a) Pipeline solution (b) Direct decomposition

Page 112: Parallel Programming - Slides

112Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2 Pn−1

Processes

Barrier

Figure 6.1 Processes reaching the barrier atdifferent times.

Time

Active

Waiting

Page 113: Parallel Programming - Slides

113Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0

Processes

Figure 6.2 Library call barriers.

Barrier();

P1

Barrier();

Pn−1

Barrier();

Processes wait untilall reach theirbarrier call

Page 114: Parallel Programming - Slides

114Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0

Processes

Figure 6.3 Barrier using a centralized counter.

Barrier();

P1

Barrier();

Pn−1

Barrier();

Counter, C

Incrementand check for n

Page 115: Parallel Programming - Slides

115Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

for(i=0;i<n;i++)recv(Pany);

for(i=0;i<n;i++)send(Pi);

Master

Figure 6.4 Barrier implementation in a message-passing system.

ArrivalphaseDeparturephase

send(Pmaster);recv(Pmaster);

Barrier:

send(Pmaster);recv(Pmaster);

Barrier:

Slave processes

Page 116: Parallel Programming - Slides

116Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2 P3 P4 P5 P6 P7

Arrivalat barrier

Departurefrom barrier

Figure 6.5 Tree barrier.

Sychronizingmessage

Page 117: Parallel Programming - Slides

117Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

1st stage

2nd stage

3rd stage

P0 P1 P2 P3 P4 P5 P6 P7

Time

Figure 6.6 Butterfly construction.

Page 118: Parallel Programming - Slides

118Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

a[0]=a[0]+k; a[n-1]=a[n-1]+k;a[1]=a[1]+k;

Instructiona[] = a[] + k;

a[0] a[n-1]a[1]

Figure 6.7 Data parallel computation.

Processors

Page 119: Parallel Programming - Slides

119Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Σi=0

0

Σi=0

1

Σi=0

2

Σi=0

3

Σi=0

4

Σi=0

5

Σi=0

6

Σi=0

7

Σi=0

8

Σi=0

9

Σi=0

10

Σi=0

11

Σi=0

12

Σi=0

15

Σi=0

14

Σi=0

13

Σi=0

0

Σi=0

1

Σi=1

2

Σi=2

3

Σi=3

4

Σi=4

5

Σi=5

6

Σi=6

7

Σi=7

8

Σi=8

9

Σi=9

10

Σi=10

11

Σi=11

12

Σi=14

15

Σi=13

14

Σi=12

13

Σi=0

0

Σi=0

1

Σi=0

2

Σi=0

3

Σi=1

4

Σi=2

5

Σi=3

6

Σi=4

7

Σi=5

8

Σi=6

9

Σi=7

10

Σi=8

11

Σi=9

12

Σi=12

15

Σi=11

14

Σi=10

13

Σi=0

0

Σi=0

1

Σi=0

2

Σi=0

3

Σi=0

4

Σi=0

5

Σi=0

6

Σi=0

7

Σi=1

8

Σi=2

9

Σi=3

10

Σi=4

11

Σi=5

12

Σi=8

15

Σi=7

14

Σi=6

13

x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15

Figure 6.8 Data parallel prefix sum operation.

Numbers

Step 1

Step 2

Step 3

Final step

Add

Add

Add

Add

(j = 0)

(j = 1)

(j = 2)

(j = 3)

Page 120: Parallel Programming - Slides

120Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Computedvalue

Error

Iteration

Exact value

Figure 6.9 Convergence rate.t+1t

Page 121: Parallel Programming - Slides

121Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 6.10 Allgather operation.

Allgather(); Allgather();

data

Allgather();

datadata

Process 0 Process n − 1Process 1

Send

Receive

buffer

buffer

xn−1x0 x1

Page 122: Parallel Programming - Slides

122Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

3228242016128400

1 × 106

2 × 106

Figure 6.11 Effects of computation and communication in Jacobi iteration.

OverallCommunication

Computation

Execution

Number of processors, p

time(τ = 1)

Page 123: Parallel Programming - Slides

123Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

hi,j

hi−1,j

hi,j−1 hi,j+1

hi+1,j

j

iMetal plate

Figure 6.12 Heat distribution problem.

Enlarged

Page 124: Parallel Programming - Slides

124Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

xi−1

xi

xi+1

xi+k

xi−k

x1 x2 xk−1

xk+1 xk+2

xk

x2k−1

xk2

x2k

Figure 6.13 Natural ordering of heatdistribution problem.

Page 125: Parallel Programming - Slides

125Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

send(g, Pi-1,j);send(g, Pi+1,j);send(g, Pi,j-1);send(g, Pi,j+1);recv(w, Pi-1,j)recv(x, Pi+1,j);recv(y, Pi,j-1);recv(z, Pi,j+1);

send(g, Pi-1,j);send(g, Pi+1,j);send(g, Pi,j-1);send(g, Pi,j+1);recv(w, Pi-1,j)recv(x, Pi+1,j);recv(y, Pi,j-1);recv(z, Pi,j+1);

send(g, Pi-1,j);send(g, Pi+1,j);send(g, Pi,j-1);send(g, Pi,j+1);recv(w, Pi-1,j)recv(x, Pi+1,j);recv(y, Pi,j-1);recv(z, Pi,j+1);

send(g, Pi-1,j);send(g, Pi+1,j);send(g, Pi,j-1);send(g, Pi,j+1);recv(w, Pi-1,j)recv(x, Pi+1,j);recv(y, Pi,j-1);recv(z, Pi,j+1);

send(g, Pi-1,j);send(g, Pi+1,j);send(g, Pi,j-1);send(g, Pi,j+1);recv(w, Pi-1,j)recv(x, Pi+1,j);recv(y, Pi,j-1);recv(z, Pi,j+1);

Figure 6.14 Message passing for heat distribution problem.

i

j

column

row

Page 126: Parallel Programming - Slides

126Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0

P1

P1

P0

Pp−1

Pp−1

Figure 6.15 Partitioning heat distribution problem.

Blocks Strips (columns)

Page 127: Parallel Programming - Slides

127Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Square blocks

Strips

n

np---

Figure 6.16 Communication consequences of partitioning.

Page 128: Parallel Programming - Slides

128Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

10001001010

1000

2000

Strip partition best

Block partition best

tstartup

Processors, pFigure 6.17 Startup times for block andstrip partitions.

Page 129: Parallel Programming - Slides

129Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Ghost points

Process i

Process i+1

One rowof points

Array heldby process i

Array heldby process i+1

Figure 6.18 Configurating array into contiguous rows for each process, with ghost points.

Copy

Page 130: Parallel Programming - Slides

130Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

20°C100°C

10ft

10ft

4ft

Figure 6.19 Room for Problem 6-14.

Page 131: Parallel Programming - Slides

131Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

vehicle

Figure 6.20 Road junction forProblem 6-16.

Page 132: Parallel Programming - Slides

132Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Airflow

Figure 6.21 Figure for Problem 6-23.

Actual dimensionsselected at will

Page 133: Parallel Programming - Slides

133Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P4

P5

P0

P1

P2

P3

P4

P5

P2P1P0

P3

Time

(b) Perfect load balancing

(a) Imperfect load balancing leading

t

Figure 7.1 Load balancing.

to increased execution time

Processors

Processors

Page 134: Parallel Programming - Slides

134Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

QueueWork pool

Slave “worker” processes

Masterprocess

Figure 7.2 Centralized work pool.

Tasks

Request task

Send task

(and possiblysubmit new tasks)

Page 135: Parallel Programming - Slides

135Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Process M0 Process Mn−1

Master, Pmaster

Slaves

Initial tasks

Figure 7.3 A distributed work pool.

Page 136: Parallel Programming - Slides

136Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Process

Requests/tasks

ProcessProcess

Process

Figure 7.4 Decentralized work pool.

Page 137: Parallel Programming - Slides

137Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 7.5 Decentralized selection algorithm requesting tasks between slaves.

RequestsSlave Pi

Localselectionalgorithm

RequestsSlave Pj

Localselectionalgorithm

Page 138: Parallel Programming - Slides

138Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Masterprocess

P1 P2 P3 Pn−1

P0

Figure 7.6 Load balancing using a pipeline structure.

Page 139: Parallel Programming - Slides

139Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

If buffer empty,make request

Receive taskfrom request

If free,requesttask

Receivetask fromrequest

If buffer full,send task

Request for task

Figure 7.7 Using a communication process in line load balancing.

Ptask

Pcomm

Page 140: Parallel Programming - Slides

140Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0

P1

P3

P2

P6P4P5

Figure 7.8 Load balancing using a tree.

Taskwhenrequested

Page 141: Parallel Programming - Slides

141Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Inactive

Active

Parent

First task

Other processes

Finalacknowledgment

Process

TaskAcknowledgment

Figure 7.9 Termination using messageacknowledgments.

Page 142: Parallel Programming - Slides

142Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P2P1 Pn−1

Token passed to next processor

Figure 7.10 Ring termination detection algorithm.

when reached local termination condition

Page 143: Parallel Programming - Slides

143Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Terminated

Token

AND

Figure 7.11 Process algorithm for localtermination.

Page 144: Parallel Programming - Slides

144Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 PiPj Pn−1

Figure 7.12 Passing task to previous processes.

Task

Page 145: Parallel Programming - Slides

145Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Terminated

AND

Terminated

AND Terminated

AND

Figure 7.13 Tree termination.

Page 146: Parallel Programming - Slides

146Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Base camp

Summit

Possible intermediate camps

B

C

A

Figure 7.14 Climbing a mountain.

F

E

D

Page 147: Parallel Programming - Slides

147Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 7.15 Graph of mountain climb.

A B C

D

E

F

10

13

17

51

8

24

9

14

Page 148: Parallel Programming - Slides

148Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

A

B

C

D

E

F

A B C D E F

10

13

17

518 24

9

∞ ∞ ∞ ∞ ∞

∞ ∞

∞∞

∞ ∞ ∞ ∞

∞14Source

Destination

A

B

C

D

E

F

Source

Weight NULL

10

8 13 24 51C D E F

14D

9E

17F

(a) Adjacency matrix

(b) Adjacency list

Figure 7.16 Representing a graph.

B

Page 149: Parallel Programming - Slides

149Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Vertex i

Vertex j

wi , j

dj

di

Figure 7.17 Moore’s shortest-path algo-rithm.

Page 150: Parallel Programming - Slides

150Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Start at

w[]

dist Process C

Process A

Master process

Figure 7.18 Distributed graph search.

Vertex

sourcevertex

w[]

dist

Vertex

dist

Process B

Newdistance

Newdistance

w[]Vertex

Other processes

Page 151: Parallel Programming - Slides

151Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Entrance

Exit

Search path

Figure 7.19 Sample maze for Problem 7-9.

Page 152: Parallel Programming - Slides

152Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Gold

Entrance

Figure 7.20 Plan of rooms for Problem 7-10.

Page 153: Parallel Programming - Slides

153Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Door

Room A

Room B

Figure 7.21 Graph representation forProblem 7-10.

Page 154: Parallel Programming - Slides

154Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Processors Memory modulesFigure 8.1 Shared memory multiprocessorusing a single bus.

Bus

Cache

Page 155: Parallel Programming - Slides

155Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

a. Brinch Hansen, P. (1975), “The Programming Language Concurrent Pascal,” IEEE Trans. Software Eng.,Vol. 1, No. 2 (June), pp. 199–207.

b. U.S. Department of Defense (1981), “The Programming Language Ada Reference Manual,” LectureNotes in Computer Science, No. 106, Springer-Verlag, Berlin.

c. Bräunl, T., R. Norz (1992), Modula-P User Manual, Computer Science Report, No. 5/92 (August), Univ.Stuttgart, Germany.

d. Thinking Machines Corp. (1990), C* Programming Guide, Version 6, Thinking Machines System Docu-mentation.

e. Gehani, N., and W. D. Roome (1989), The Concurrent C Programming Language, Silicon Press, NewJersey.

f. Fox, G., S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C. Tseng, and M. Wu (1990), Fortran DLanguage Specification, Technical Report TR90-141, Dept. of Computer Science, Rice University.

TABLE 8.1 SOME EARLY PARALLEL PROGRAMMING LANGUAGES

Language Originator/date Comments

Concurrent Pascal Brinch Hansen, 1975a Extension to Pascal

Ada U.S. Dept. of Defense, 1979b Completely new language

Modula-P Bräunl, 1986c Extension to Modula 2

C* Thinking Machines, 1987d Extension to C for SIMD systems

Concurrent C Gehani and Roome, 1989e Extension to C

Fortran D Fox et al., 1990f Extension to Fortran for data parallel programming

Page 156: Parallel Programming - Slides

156Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 8.2 FORK-JOIN construct.

Main program

FORK

FORK

FORK

JOIN

JOIN JOIN

JOIN

Spawned processes

Page 157: Parallel Programming - Slides

157Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

IP

Stack

Code Heap

Files

Interrupt routines

Code Heap

Files

Interrupt routines

IP

Stack

IP

Stack

Thread

Thread

(a) Process

(b) ThreadsFigure 8.3 Differences between a processand threads.

Page 158: Parallel Programming - Slides

158Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Main program

pthread_create(&thread1, NULL, proc1, &arg);

pthread_join(thread1, *status);

proc1(&arg)

return(*status);

{

}

Figure 8.4 pthread_create() and pthread_join().

thread1

Page 159: Parallel Programming - Slides

159Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Main program

Figure 8.5 Detached threads.

Thread

pthread_create();

pthread_create();

pthread_create(); Termination

Thread

Thread

Termination

Termination

Page 160: Parallel Programming - Slides

160Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

+1 +1

Shared variable, x

Read

Write Write

Read

Process 1 Process 2Figure 8.6 Conflict in accessing sharedvariable.

Page 161: Parallel Programming - Slides

161Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Process 1 Process 2

while (lock == 1) do_nothing;lock = 1;

Critical section

lock = 0;

while (lock == 1)do_nothing;

lock = 1;

Critical section

lock = 0;

Figure 8.7 Control of critical sections through busy waiting.

Page 162: Parallel Programming - Slides

162Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 8.8 Deadlock (deadly embrace).

R1 R2

R1 R2 Rn −1 Rn

P1 P2

P1 P2 Pn −1 Pn

(a) Two-process deadlock

(b) n-process deadlock

Resource

Process

Page 163: Parallel Programming - Slides

163Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Block

Cache

Processor 1

Cache

Processor 2

Main memory

Block in cache

76543210

Addresstag

Figure 8.9 False sharing in caches.

Page 164: Parallel Programming - Slides

164Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Array a[]sum

addr

Figure 8.10 Shared memory locations for Section 8.4.1 program example.

Page 165: Parallel Programming - Slides

165Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Array a[]global_index

addr

sum

Figure 8.11 Shared memory locations for Section 8.4.2 program example.

Page 166: Parallel Programming - Slides

166Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Test1Test2

Test3

Output1

Output2

1 2

3Figure 8.12 Sample logic circuit.

TABLE 8.2 LOGIC CIRCUIT DESCRIPTION FOR FIGURE 8.12

Gate Function Input 1 Input 2 Output

1 AND Test1 Test2 Gate1

2 NOT Gate1 Output1

3 OR Test3 Gate1 Output2

Page 167: Parallel Programming - Slides

167Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Movement

River

Log

of logs

Figure 8.13 River and frog for Problem 8-23.

Frog

Page 168: Parallel Programming - Slides

168Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Master

Slaves

Pool of threads

Request

Signal

Requestserviced

Figure 8.14 Thread pool for Problem 8-24.

Page 169: Parallel Programming - Slides

169Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

a[i] a[0] a[i] a[n-1]

Incrementcounter, x

b[x] = a[i] Figure 9.1 Finding the rank in parallel.

Compare

Page 170: Parallel Programming - Slides

170Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

a[i] a[0] a[i] a[1] a[i] a[2] a[i] a[3]

Tree

Add

0/1 0/10/1 0/1

Add

0/1/2 0/1/2

Add

Figure 9.2 Parallelizing the rank computation.

0/1/2/3/4

Compare

Page 171: Parallel Programming - Slides

171Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 9.3 Rank sort using a master andslaves.

a[] b[]

Slaves

Master

Readnumbers

Place selectednumber

Page 172: Parallel Programming - Slides

172Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

A

P1

Compare

B

P2

Send(A)

If A > B send(B)

Figure 9.4 Compare and exchange on a message-passing system — Version 1.

If A > B load Aelse load B

else send(A)

1

3

2

Sequence of steps

Page 173: Parallel Programming - Slides

173Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Compare

A

P1

Compare

B

P2

Send(A)

Send(B)

Figure 9.5 Compare and exchange on a message-passing system — Version 2.

If A > B load AIf A > B load B

1

3

2

3

Page 174: Parallel Programming - Slides

174Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

43422825

88502825

Returnlowernumbers

98804342

88502825

43422825

98888050

Merge

Keephighernumbers

Figure 9.6 Merging two sublists — Version 1.

Originalnumbers

Finalnumbers

P1 P2

Page 175: Parallel Programming - Slides

175Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

88502825

98804342

43422825

98888050

Merge

Keeplowernumbers

88502825

98804342

43422825

98888050

Merge

Keephighernumbers

Figure 9.7 Merging two sublists — Version 2.

P1 P2

Originalnumbers

Originalnumbers

(final

(finalnumbers)

numbers)

Page 176: Parallel Programming - Slides

176Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Time

4 2 7 8 5 1 3 6

2 4 7 8 5 1 3 6

2 4 7 8 5 1 3 6

2 4 7 8 5 1 3 6

2 4 7 5 8 1 3 6

2 4 7 5 1 8 3 6

2 4 7 5 1 3 8 6

2 4 7 5 1 3 6 8

2 4 7 5 1 3 6 8

2 4 7 5 1 3 6 8

2 4 5 7 1 3 6 8

2 4 5 1 7 3 6 8

2 4 5 1 3 7 6 8

2 4 5 1 3 6 7 8

2 4 5 1 3 6 7 8

Figure 9.8 Steps in bubble sort.

Original

Phase 1

Phase 2

Phase 3

sequence: 4 2 7 8 5 1 3 6

Placelargestnumber

Placenextlargestnumber

Page 177: Parallel Programming - Slides

177Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

1

1

1

12

2

3 2 1

Time

Figure 9.9 Overlapping bubble sort actions in a pipeline.

Phase 3

Phase 2

Phase 1

3 2 1

Phase 4

4 3 2 1

Page 178: Parallel Programming - Slides

178Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

4 2 7 5 1 68 3

2 4 7 1 5 68 3

2 4 7 8 3 61 5

2 4 1 3 8 67 5

2 1 4 7 5 63 8

1 2 3 5 7 84 6

1 2 3 5 6 84 7

1 2 3 5 6 84 7

Step

1

2

3

4

5

6

7

0

Figure 9.10 Odd-even transposition sort sorting eight numbers.

P0 P1 P2 P3 P4 P5 P6 P7

Time

Page 179: Parallel Programming - Slides

179Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Smallest

Largest

number

number Figure 9.11 Snakelike sorted list.

Page 180: Parallel Programming - Slides

180Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

4 14 8 2

10 3 13 16

7 15 1 5

12 6 11 9

2 4 8 14

16 13 10 3

1 5 7 15

12 11 9 6

1 4 7 3

2 5 8 6

12 11 9 14

16 13 10 15

1 3 4 7

8 6 5 2

9 11 12 14

16 15 13 10

1 3 4 2

8 6 5 7

9 11 12 10

16 15 13 14

1 2 3 4

8 7 6 5

9 10 11 12

16 15 14 13

(a) Original placement

Figure 9.12 Shearsort.

(b) Phase 1 — Row sort (c) Phase 2 — Column sort

(d) Phase 3 — Row sort (e) Phase 4 — Column sort (f) Final phase — Row sort

of numbers

Page 181: Parallel Programming - Slides

181Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

(b) Transpose operation(a) Operations between elementsin rows

(c) Operations between elementsin rows (originally columns)

Figure 9.13 Using the transpose operation to maintain operations in rows.

Page 182: Parallel Programming - Slides

182Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

4 2 6

4 2 7 8 5 1 3 6

4 2 7 8 5 1 3 6

7 8 5 1 3

4 2 67 8 5 1 3

2 4 6

1 2 3 4 5 6 7 8

2 4 7 8 1 3 5 6

7 8 1 5 3

Sorted list

Unsorted list

Figure 9.14 Mergesort using tree allocation of processes.

Merge

Dividelist

P0

P2P0

P4 P5 P6 P7P1 P2 P3P0

P0

P6P4

P4

P0

P2P0

P0

P6P4

P4

Process allocation

Page 183: Parallel Programming - Slides

183Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P4

P6P1P0

2 1 6

4 2 7 8 5 1 3 6

3 2 1 4 5 7 8 6

3 4 5 7 8

1 2 7 86

Sorted list

Unsorted list

Figure 9.15 Quicksort using tree allocation of processes.

P0

P0

P7

P0

P6

P4

Process allocation

Pivot

3

P2

Page 184: Parallel Programming - Slides

184Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

862 6

1 2 6

4 2 7 8 5 1 3 6

3 2 1 5 7 8 6

7 8

Sorted list

Unsorted list

Figure 9.16 Quicksort showing pivot withheld in processes.

4

1

82

3

7

5

Pivots

Pivot

Page 185: Parallel Programming - Slides

185Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Work pool

Sublists

Slave processes

Requestsublist Return

sublistFigure 9.17 Work pool implementation ofquicksort.

Page 186: Parallel Programming - Slides

186Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

(a) Phase 1 001 010 011 100 101 110 111000

001 010 011 100 101 110 111000(b) Phase 2

≤ p1 > p1

001 010 011 100 101 110 111000(c) Phase 3

> p2 > p3≤ p3≤ p2

> p6 > p7≤ p7≤ p6> p4 > p5≤ p5≤ p4

Figure 9.18 Hypercube quicksort algorithm when the numbers are originally in node 000.

Page 187: Parallel Programming - Slides

187Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

(a) Phase 1

Broadcast pivot, p1

001 010 011 100 101 110 111000

001 010 011 100 101 110 111000(b) Phase 2

≤ p1 > p1

Broadcast pivot, p3Broadcast pivot, p2

001 010 011 100 101 110 111000(c) Phase 3

Broadcastpivot, p4

Broadcastpivot, p5

Broadcastpivot, p6

Broadcastpivot, p7

> p2 > p3≤ p3≤ p2

> p6 > p7≤ p7≤ p6> p4 > p5≤ p5≤ p4

Figure 9.19 Hypercube quicksort algorithm when numbers are distributed among nodes.

Page 188: Parallel Programming - Slides

188Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

(a) Phase 1 communication

(b) Phase 2 communication

(c) Phase 3 communication

Figure 9.20 Hypercube quicksortcommunication.

000 001

101

010 011

110 111

100

000 001

101

010 011

110 111

100

000 001

101

010 011

110 111

100

Page 189: Parallel Programming - Slides

189Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

(a) Phase 1

Broadcast pivot, p1

001 011 010 110 111 101 100000

001 011 010 110 111 101 100000(b) Phase 2

≤ p1 > p1

Broadcast pivot, p3Broadcast pivot, p2

001 011 010 110 111 101 100000(c) Phase 3

Broadcastpivot, p4

Broadcastpivot, p5

Broadcastpivot, p6

Broadcastpivot, p7

> p2 > p3≤ p3≤ p2

> p6 > p7≤ p7≤ p6> p4 > p5≤ p5≤ p4

Figure 9.21 Quicksort hypercube algorithm with Gray code ordering.

Page 190: Parallel Programming - Slides

190Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

82 4 5 1 6 73

83 4 761 2 5

Odd indicesEven indices

Sorted lists

a[] b[]

c[] d[]

e[]Final sorted list

Compare and exchange

1 2 3 4 5 6 7 8Figure 9.22 Odd-even merging of twosorted lists.

Merge

Merge

Page 191: Parallel Programming - Slides

191Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

a2

b2

a4

b4

a3

b3

a1

b1

bn

anan−1

bn−1

Evenmergesort

Oddmergesort

c1c2c3c4

c2nc2n−1

Compare andexchange

Figure 9.23 Odd-even mergesort.

c5

c7c6

c2n−2

Page 192: Parallel Programming - Slides

192Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

a0, a1, a2, a3, … an−2, an−1

Figure 9.24 Bitonic sequences.

Value

a0, a1, a2, a3, … an−2, an−1

(a) Single maximum (b) Single maximum and single minimum

Page 193: Parallel Programming - Slides

193Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

3 5 8 9 7 4 2 1

3 4 2 1 7 5 8 9

Bitonic sequence

Bitonic sequence Bitonic sequence

Compare andexchange

Figure 9.25 Creating two bitonicsequences from one bitonic sequence.

Page 194: Parallel Programming - Slides

194Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

3 5 8 9 7 4 2 1

3 4 2 1 7 5 8 9

Compare andexchange

2 1 3 4 7 5 8 9

1 2 3 4 5 7 8 9Sorted list Figure 9.26 Sorting a bitonic sequence.

Unsorted numbers

Page 195: Parallel Programming - Slides

195Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Sorted list

Figure 9.27 Bitonic mergesort.

Unsorted numbers

Bitonicsortingoperation

Directionof increasingnumbers

Page 196: Parallel Programming - Slides

196Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

8 3 4 7 9 2 1 5

3 8 7 4 2 9 5 1

3 4 7 8 5 9 2 1

3 4 7 8 9 5 2 1

3 4 2 1 9 5 7 8

2 1 3 4 7 5 9 8

1 2 3 4 5 7 8 9

1

2

3

4

5

6

Compare and exchangeai with ai+n/2 (n numbers)

n = 2 ai with ai+1

n = 4 ai with ai+2

Formbitonic listsof four

Formbitonic listof eight

numbers

numbers

Split

Sort

n = 2 ai with ai+1

Sort bitonic list

n = 8 ai with ai+4

n = 4 ai with ai+2

n = 2 ai with ai+1

Split

Split

Sort

Step

Figure 9.28 Bitonic mergesort on eight numbers.

Compare andexchange

HigherLower

= bitonic list[Fig. 9.24 (a) or (b)]

Page 197: Parallel Programming - Slides

197Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

88502825

98804342

43422825

98888050

50422825

98888043

Figure 9.29 Compare-and-exchangealgorithm for Problem 9-5.

Step 1

Step 2

Step 3

Terminates when insertions at top/bottom of lists

Page 198: Parallel Programming - Slides

198Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

a0,0 a0,1

a1,0

a0,m−2

an−1,0

a0,m−1

an−2,0

an−1,m−1an−1,m−2

an−2,m−1

a1,1 a1,m−2 a1,m−1

an−2,1 an−2,m-2

an−1,1

Row

Column

Figure 10.1 An n × m matrix.

Page 199: Parallel Programming - Slides

199Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

× =A B C

Figure 10.2 Matrix multiplication, C = A × B.

i

j

ci,j

Row

ColumnMultiply Sum

results

Page 200: Parallel Programming - Slides

200Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

× =A b c

Figure 10.3 Matrix-vector multiplicationc = A × b.

i ci

Rowsum

Page 201: Parallel Programming - Slides

201Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

× =

Sum

A B C

Figure 10.4 Block matrix multiplication.

p

qMultiply results

Page 202: Parallel Programming - Slides

202Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

a0,0 a0,1 a0,2 a0,3

a1,0

a2,0

a3,0

a1,2a1,1

a2,1

a3,1

a2,2

a3,2 a3,3

a1,3

a2,3

b0,0 b0,1 b0,2 b0,3

b1,0

b2,0

b3,0

b1,2b1,1

b2,1

b3,1

b2,2

b3,2 b3,3

b1,3

b2,3

a0,0 a0,1

a1,0 a1,1

b0,0 b0,1

b1,0 b1,1

a0,2 a0,3

a1,2 a1,3

b2,0 b2,1

b3,0 b3,1

(a) Matrices

(b) Multiplying A0,0 × B0,0 to obtain C0,0

a0,0b0,0+a0,1b1,0 a0,0b0,1+a0,1b1,1

a1,0b0,0+a1,1b1,0 a1,0b0,1+a1,1b1,1

A0,0 B0,0 A0,1 B1,0

a0,2b2,0+a0,3b3,0 a0,2b2,1+a0,3b3,1

a1,2b2,0+a1,3b3,0 a1,2b2,1+a1,3b3,1

+

× + ×

=

=

a0,0b0,0+a0,1b1,0+a0,2b2,0+a0,3b3,0 a0,0b0,1+a0,1b1,1+a0,2b2,1+a0,3b3,1

a1,0b0,0+a1,1b1,0+a1,2b2,0+a1,3b3,0 a1,0b0,1+a1,1b1,1+a1,2b2,1+a1,3b3,1

= C0,0

Figure 10.5 Submatrix multiplication.

×

Page 203: Parallel Programming - Slides

203Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

b[][j]

a[i][]Row i

Column j

c[i][j]

Processor Pi,j

Figure 10.6 Direct implementation ofmatrix multiplication.

Page 204: Parallel Programming - Slides

204Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 10.7 Accumulation using a treeconstruction.

P0 P1 P2 P3

P0

P0 P2

c0,0

a0,0 b0,0 a0,1 b1,0 a0,2 b2,0 a0,3 b3,0

×

+

×××

+

+

Page 205: Parallel Programming - Slides

205Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

App

Aqp Aqq

Apq

i j

i

j

Bpp

Bqp Bqq

Bpq Cpp

Cqp Cqq

Cpq

P1 P3P2P0

P0 + P1

P4 + P5 P6 + P7

P2 + P3

P5 P7P6P4

Figure 10.8 Submatrix multiplication and summation.

Page 206: Parallel Programming - Slides

206Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

B

A

Figure 10.9 Movement of A and Belements.

j

i

Pi,j

Page 207: Parallel Programming - Slides

207Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

B

A

Figure 10.10 Step 2 — Alignment ofelements of A and B.

j

i

bi+j,j

ai,j+i

i places

j places

Page 208: Parallel Programming - Slides

208Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

B

A

Figure 10.11 Step 4 — One-place shift ofelements of A and B.

j

i

Pi,j

Page 209: Parallel Programming - Slides

209Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

c0,0 c0,1 c0,2 c0,3

c1,0 c1,1 c1,2 c1,3

c2,0 c2,1 c2,2 c2,3

c3,0 c3,1 c3,2 c3,3

b3,0b2,0b1,0b0,0

b3,3b2,3b1,3b0,3

b3,2b2,2b1,2b0,2

b3,1b2,1b1,1b0,1

a0,3 a0,2 a0,1 a0,0

a3,3 a3,2 a3,1 a3,0

a2,3 a2,2 a2,1 a2,0

a1,3 a1,2 a1,1 a1,0

Figure 10.12 Matrix multiplication using a systolic array.

Pumpingaction

One cycle delay

Page 210: Parallel Programming - Slides

210Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

c0

c1

c2

c3

b3b2b1b0

a0,3 a0,2 a0,1 a0,0

a3,3 a3,2 a3,1 a3,0

a2,3 a2,2 a2,1 a2,0

a1,3 a1,2 a1,1 a1,0

Figure 10.13 Matrix-vector multiplicationusing a systolic array.

Pumpingaction

Page 211: Parallel Programming - Slides

211Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Clearedto zero

Alreadyclearedto zero

Row i

Column i

Column

Row

Figure 10.14 Gaussian elimination.

Row j

Step throughaji

Page 212: Parallel Programming - Slides

212Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Already

Row i

Column

Row

Figure 10.15 Broadcast in parallel implementation of Gaussian elimination.

Broadcastith row

n − i +1 elements(including b[i])

clearedto zero

Page 213: Parallel Programming - Slides

213Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Broadcast Figure 10.16 Pipeline implementation ofGaussian elimination.

P0 P1 P2 Pn−1

rows

Row

Page 214: Parallel Programming - Slides

214Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0

P1

P3

P2

0

n/p

2n/p

3n/p

Figure 10.17 Strip partitioning.

Row

Page 215: Parallel Programming - Slides

215Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0

P1

Figure 10.18 Cyclic partitioning toequalize workload.

0

n/p

2n/p

3n/p

Row

Page 216: Parallel Programming - Slides

216Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

∆ ∆

f(x, y)

Solution space

y

x Figure 10.19 Finite difference method.

Page 217: Parallel Programming - Slides

217Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 10.20 Mesh of points numbered in natural order.

x1 x4x3x2 x8x7x6x5 x9

x31 x34x33x32 x38x37x36x35 x39

x41 x44x43x42 x48x47x46x45 x49

x51 x54x53x52 x58x57x56x55 x59

x61 x64x63x62 x68x67x66x65 x69

x71 x74x73x72 x78x77x76x75 x79

x11 x14x13x12 x18x17x16x15 x19

x21 x24x23x22 x28x27x26x25 x29

x81 x84x83x82 x88x87x86x85 x89

x60

x70

x80

x90

x40

x50

x30

x20

x10

x91 x94x93x92 x98x97x96x95 x99 x100

Boundary points (see text)

Page 218: Parallel Programming - Slides

218Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

−4−4

−4

−4

−4

1 1 11 1

1 1

1 11

ai,i ai,i+nai,i−1 ai,i+1ai,i−n1 1ith equation

11

11

1

1

Figure 10.21 Sparse matrix for Laplace’s equation.

×

x1

=

To includeboundary values

11

A x

00

00

and some zeroentries (see text)

x2

xN

xN-1

Those equations with a boundarypoint on diagonal unnecessary

for solution

Page 219: Parallel Programming - Slides

219Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Point

Point to becomputed

computed

Sequential order of computation

Figure 10.22 Gauss-Seidel relaxation with natural order, computed sequentially.

Page 220: Parallel Programming - Slides

220Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Red

Black

Figure 10.23 Red-black ordering.

Page 221: Parallel Programming - Slides

221Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 10.24 Nine-point stencil.

Page 222: Parallel Programming - Slides

222Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Coarsest grid points Finer grid pointsProcessor

Figure 10.25 Multigrid processorallocation.

Page 223: Parallel Programming - Slides

223Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

50°C

40°C 60°C

Ambient temperature at edges of board = 20°C

Figure 10.26 Printed circuit board for Problem 10-18.

Page 224: Parallel Programming - Slides

224Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 11.1 Pixmap.

j

i

Origin (0, 0)

p(i, j)Picture element(pixel)

Page 225: Parallel Programming - Slides

225Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

0 255Gray level

Numberof pixels

Figure 11.2 Image histogram.

Page 226: Parallel Programming - Slides

226Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x0

x3 x4 x5

x1 x2

x6 x7 x8 Figure 11.3 Pixel values for a 3 × 3 group.

Page 227: Parallel Programming - Slides

227Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Step 1Each pixel addspixel from left

Step 2Each pixel addspixel from right

Step 3Each pixel adds pixel

from above

Step 4Each pixel adds pixel

from below

Figure 11.4 Four-step data transfer for the computation of mean.

Page 228: Parallel Programming - Slides

228Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x3 + x4

x0 + x1

x6 + x7

(a) Step 1 (b) Step 2

(c) Step 3 (d) Step 4

x0

x3

x2x1

x7x6

x4 x5

x8

x3 + x4 + x5

x0 + x1 + x2

x6 + x7 + x8

x0

x3

x2x1

x7x6

x4 x5

x8

x0 + x1 + x2

x0 + x1 + x2

x6 + x7 + x8

x0

x3

x2x1

x7x6

x4 x5

x8

x3 + x4 + x5

x0 + x1 + x2

x0 + x1 + x2

x6 + x7 + x8

x0

x3

x2x1

x7x6

x4 x5

x8

x3 + x4 + x5x6 + x7 + x8

Figure 11.5 Parallel mean data accumulation.

Page 229: Parallel Programming - Slides

229Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Largest Next largest

Next largest

in row in row

in column

Figure 11.6 Approximate median algorithm requiring six steps.

Page 230: Parallel Programming - Slides

230Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

w0 w2

w3 w4

w1

w7w6

w5

w8

x0 x2

x3 x4

x1

x7x6

x5

x8

⊗ =

Figure 11.7 Using a 3 × 3 weighted mask.

x4'

Mask Pixels Result

Page 231: Parallel Programming - Slides

231Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

1

1 1 1

1 1

1 1 1Figure 11.8 Mask to compute mean.

19

k =

Page 232: Parallel Programming - Slides

232Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

1

1 8 1

1 1

1 1 1Figure 11.9 A noise reduction mask.

116

k =

Page 233: Parallel Programming - Slides

233Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

−1

−1 8 −1

−1 −1

−1 −1 −1 Figure 11.10 High-pass sharpening filtermask.

19

k =

Page 234: Parallel Programming - Slides

234Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Intensity transition

First derivative

Figure 11.11 Edge detection usingdifferentiation.

Second derivative

Page 235: Parallel Programming - Slides

235Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

f(x, y)

x

y

Image

Figure 11.12 Gray level gradient anddirection.

Gradient

φ

Constantintensity

Page 236: Parallel Programming - Slides

236Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

0

1

0

1

−1

0

1

−1−1

−1

1

0

0

1

1

−1

0−1

Figure 11.13 Prewitt operator.

Page 237: Parallel Programming - Slides

237Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

0

1

0

2

−1

0

1

−2−1

−2

1

0

0

1

2

−1

0−1

Figure 11.14 Sobel operator.

Page 238: Parallel Programming - Slides

238Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 11.15 Edge detection with Sobel operator.

(a) Original image (Annabel) (b) Effect of Sobel operator

Page 239: Parallel Programming - Slides

239Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

−1

0

4

−1

0

−1

0

−10

Figure 11.16 Laplace operator.

Page 240: Parallel Programming - Slides

240Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x1

x5

x7

x4x3

Left pixel Right pixel

Upper pixel

Lower pixelFigure 11.17 Pixels used in Laplaceoperator.

Page 241: Parallel Programming - Slides

241Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 11.18 Effect of Laplace operator.

Page 242: Parallel Programming - Slides

242Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

y = ax + b b = −x1a + y1y

x

b

a

Figure 11.19 Mapping a line into (a, b) space.

(b) Parameter space(a) (x, y) plane

(a, b)

Pixel in image

b = −xa + y(x1, y1)

Page 243: Parallel Programming - Slides

243Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

r = x cos θ + y sin θ

r

θ

Figure 11.20 Mapping a line into (r, θ) space.

(b) (r, θ) plane(a) (x, y) plane

y = ax + by

x

(r, θ)

Page 244: Parallel Programming - Slides

244Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

θr

y

x

Figure 11.21 Normal representation usingimage coordinate system.

Page 245: Parallel Programming - Slides

245Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

r

θ0°

5

10°0

1015

20°30°

Accumulator

Figure 11.22 Accumulators, acc[r][θ], forthe Hough transform.

Page 246: Parallel Programming - Slides

246Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

xjk

j

k Transformrows

Xjm

Transformcolumns

Xlm

Figure 11.23 Two-dimensional DFT.

Page 247: Parallel Programming - Slides

247Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

fj,k

gj,k

hj,k

Image

Filter/image

F(j, k)

G(j, k)

H(j, k)

f(j, k)

g(j, k)

h(j, k)

MultiplyConvolution

Transform

Inversetransform

×

(a) Direct convolution (b) Using Fourier transform

Figure 11.24 Convolution using Fourier transforms.

Page 248: Parallel Programming - Slides

248Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Slave processes

Master process

w0 w1 wn−1

X[0] X[1] X[n−1]Figure 11.25 Master-slave approach forimplementing the DFT directly.

Page 249: Parallel Programming - Slides

249Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

+××a

wk

x[j]

Figure 11.26 One stage of a pipelineimplementation of DFT algorithm.

X[k]

Process j

X[k]

a

Values fornext iteration

a × x[j]

wk

Page 250: Parallel Programming - Slides

250Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Time

Figure 11.27 Discrete Fourier transform with a pipeline.

P0 P1 P2 P3 PN−1

(a) Pipeline structure

(b) Timing diagram

Output sequence

X[0] X[1] X[2] X[3] X[4] X[6]X[5]

Pipelinestages

X[0],X[1],X[2],X[3]…X[k]

a

wk

0

1

wk

x[0] x[1] x[2] x[3] x[N−1]

P0

P1

P2

PN−1

PN−2

Page 251: Parallel Programming - Slides

251Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x0x1x2

xN−1

x3

xN−2

N/2 ptDFT

N/2 ptDFT

Xk

Xk+N/2

k = 0, 1, … N/2

+

−Xodd × wk

Xeven

Figure 11.28 Decomposition of N-point DFT into two N/2-point DFTs.

Input sequence Transform

Page 252: Parallel Programming - Slides

252Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

X0

X1

X2

X3

x0

x1

x2

x3

+−

+++

+

+Figure 11.29 Four-point discrete Fouriertransform.

Page 253: Parallel Programming - Slides

253Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Xk = Σ(0,2,4,6,8,10,12,14)+wkΣ(1,3,5,7,9,11,13,15)

{[Σ(0,8)+wkΣ(4,12)]+wk[Σ(2,10)+wkΣ(6,14)]}+{[Σ(1,9)+wkΣ(5,13)]+wk[Σ(3,11)+wkΣ(7,15)]}

{Σ(0,4,8,12)+wkΣ(2,6,10,14)}+wk{Σ(1,5,9,13)+wkΣ(3,7,11,15)}

x0 x8 x4 x12 x2 x10 x6 x14 x1 x9 x5 x13 x3 x11 x7 x15

0000 1000 0100 1100 0010 1010 0110 1011 0001 1001 0101 1101 0011 1011 0111 1111

Figure 11.30 Sixteen-point DFT decomposition.

Page 254: Parallel Programming - Slides

254Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x0

x8

x4

x12

x2

x10

x6

x14

x1

x9

x5

x13

x3

x11

x7

x15

X0

X1

X2

X3

X4

X5

X6

X7

X8

X9

X10

X11

X12

X13

X14

X15

Figure 11.31 Sixteen-point FFT computational flow.

Page 255: Parallel Programming - Slides

255Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

x0

x8

x4

x12

x2

x10

x6

x14

x1

x9

x5

x13

x3

x11

x7

x15

X0

X1

X2

X3

X4

X5

X6

X7

X8

X9

X10

X11

X12

X13

X14

X15

Figure 11.32 Mapping processors onto 16-point FFT computation.

P0

P2

P1

P3

0000

0001

0010

0011

0100

0101

0110

0111

1001

1010

1011

1100

1101

1110

1111

1000

P/r

ProcessRow

Inputs Outputs

Page 256: Parallel Programming - Slides

256Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2 P3

Figure 11.33 FFT using transposealgorithm — first two steps.

x0

x4

x8

x12

x1

x5

x9

x13

x2

x6

x10

x14

x3

x7

x11

x15

Page 257: Parallel Programming - Slides

257Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

P0 P1 P2 P3

Figure 11.34 Transposing array fortranspose algorithm.

x0

x4

x8

x12

x1

x5

x9

x13

x2

x6

x10

x14

x3

x7

x11

x15

Page 258: Parallel Programming - Slides

258Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Figure 11.35 FFT using transposealgorithm — last two steps.

P0 P1 P2 P3

x0

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

x13

x14

x15

Page 259: Parallel Programming - Slides

259Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

1 2 3 4 5 6 7

1

2

3

4

5

6

7

Mask

Figure 11.36 Image for Problem 11-3.

Page 260: Parallel Programming - Slides

260Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

First choice

Second choice

Third choice

Figure 12.1 State space tree.

C0 C1 Cn−1

Notincluding

C0

Notincluding

C1

Notincluding

Cn−1

Page 261: Parallel Programming - Slides

261Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

A1 A2

B1 B2

A1 B2

B1 A2

Parent A

Parent B

Child 1

Child 2Figure 12.2 Single-point crossover.

1

1

1

1

p

p

p

p

p+1

p+1

p+1

p+1

m

m

m

m

Page 262: Parallel Programming - Slides

262Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Subpopulation

Migration path;

Figure 12.3 Island model.

every island sendsto every other island

Page 263: Parallel Programming - Slides

263Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Island subpopulations

Limited migration path Figure 12.4 Stepping stone model

Page 264: Parallel Programming - Slides

264Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Instructions

Figure D.1 PRAM model.

Shared memory

Program

Data

ProcessorsClock

with localmemory

Page 265: Parallel Programming - Slides

265Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

d[1] s[1] d[2] s[2] d[3] s[3] d[4] s[4] d[5] s[5] d[6] s[6]d[0] s[0] d[7] s[7]

1 111 0111

Figure D.2 List ranking by pointer jumping.

Null

2 222 0122

4 444 0123

7 456 0123

Page 266: Parallel Programming - Slides

266Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

Local computation

Communication

Barrier synchronization

Threads or processes

Maximum of hsends or receives

(maximum time w)

Figure D.3 A view of the bulk synchronous parallel model.

Page 267: Parallel Programming - Slides

267Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998

o

L

g

Pi

Pk

PiTime

MessageProcessors

Figure D.4 LogP parameters.o

Next message

Page 268: Parallel Programming - Slides

268Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers

Barry Wilkinson and Michael Allen Prentice Hall, 1998