Upload
phobos
View
62
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Parallel Programming - Slides
Citation preview
1Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 1.1 Astrophysical N-bodysimulation by Scott Linssen (undergraduateUniversity of North Carolina at Charlotte[UNCC] student).
2Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 1.2 Conventional computer havinga single processor and memory.
Main memory
Processor
Instructions (to processor)Data (to or from processor)
3Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 1.3 Traditional shared memorymultiprocessor model.Processors
Interconnectionnetwork
Memory modulesOneaddressspace
4Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Processor
Interconnectionnetwork
Local
Computers
Messages
Figure 1.4 Message-passingmultiprocessor model (multicomputer).
memory
5Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Processor
Interconnectionnetwork
Shared
Computers
Messages
Figure 1.5 Shared memory multiprocessorimplementation.
memory
6Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 1.6 MPMD structure.
Program
Processor
Data
Program
Processor
Data
InstructionsInstructions
7Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P M
C
P M
C
P M
C
Figure 1.7 Static link multicomputer.
Computers
Network with direct linksbetween computers
8Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Linksto other
nodes
Switch
Processor Memory
Computer (node)
Linksto othernodes
Figure 1.8 Node with a switch for internode message transfers.
9Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Link
Figure 1.9 A link between two nodes withseparate wires in each direction.
NodeNode
10Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 1.10 Ring.
11Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 1.11 Two-dimensional array(mesh).
LinksComputer/processor
12Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 1.12 Tree structure.
Processingelement
Root
Links
13Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 1.13 Three-dimensional hypercube.000 001
010 011
100
110
101
111
14Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
0000 0001
0010 0011
0100
0110
0101
0111
1000 1001
1010 1011
1100
1110
1101
1111
Figure 1.14 Four-dimensional hypercube.
15Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 1.15 Embedding a ring onto a torus.
Ring
16Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 1.16 Embedding a mesh into ahypercube.
00
01
11
10
00 01 11 10yx
Nodal address1011
17Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 1.17 Embedding a tree into a mesh.
Root
A
A
A
A
A
A
18Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
HeadPacket
Request/Acknowledge
signal(s)
Figure 1.18 Distribution of flits.
Flit buffer
Movement
19Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Data
R/A
Source Destinationprocessor processor
Figure 1.19 A signaling method betweenprocessors for wormhole routing (Ni andMcKinley, 1993).
20Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Packet switching
Circuit switchingWormhole routing
Distance
Network
(number of nodes between source and destination)
latency
Figure 1.20 Network delay characteristics.
21Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Messages
Node 1 Node 2
Node 3Node 4
Figure 1.21 Deadlock in store-and-forwardnetworks.
22Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Physical link
Virtual channel
Route
buffer Node Node
Figure 1.22 Multiple virtual channels mapped onto a single physical channel.
23Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Workstations Figure 1.23 Ethernet-type single wirenetwork.
Workstation/
Ethernet
file server
24Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 1.24 Ethernet frame format.
Preamble
(64 bits)
Destinationaddress(48 bits)
Sourceaddress(48 bits)
Type
(16 bits)
Data
(variable)
Frame checksequence(32 bits)
Direction
25Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Network
Workstation/
Workstations
Figure 1.25 Network of workstations connected via a ring.
file server
26Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Workstation/file server
Workstations
Figure 1.26 Star connected network.
27Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 1.27 Overlapping connectivity Ethernets.
(a) Using specially designed adaptors
(b) Using separate Ethernet interfaces
Parallel programming cluster
28Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Time
Process 1
Process 2
Process 3
Process 4
Waiting to send a message
Figure 1.28 Space-time diagram of a message-passing program.
Message
Computing
Slope indicating timeto send message
29Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Serial section Parallelizable sections
(a) One processor
(b) Multipleprocessors
fts (1 − f)ts
ts
(1 − f)ts/n
Figure 1.29 Parallelizing sequential problem — Amdahl’s law.
tp
n processors
30Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 1.30 (a) Speedup against number of processors. (b) Speedup against serial fraction, f.
4
8
12
16
20
0.2 0.4 0.6 0.8 1.0
Spee
dup
fact
or,S
(n)
Serial fraction, f
(b)
n = 256
n = 164
8
12
16
20
4 8 12 16 20
f = 20%
f = 10%
f = 5%
f = 0%
Spee
dup
fact
or,S
(n)
Number of processors, n
(a)
31Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Sourcefile
Executables
Processor 0 Processor n − 1Figure 2.1 Single program, multiple dataoperation.
Compile to suitprocessor
32Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Process 1
Process 2spawn();
Figure 2.2 Spawning a process.
Time
Start executionof process 2
33Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 2.3 Passing a message betweenprocesses using send() and recv()library calls.
Process 1 Process 2
send(&x, 2);
recv(&y, 1);
x y
Movementof data
34Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 2.4 Synchronous send() and recv() library calls using a three-way protocol.
Process 1 Process 2
send();
recv();Suspend
Time
processAcknowledgment
MessageBoth processescontinue
(a) When send() occurs before recv()
Process 1 Process 2
recv();
send();Suspend
Time
process
Acknowledgment
MessageBoth processescontinue
(b) When recv() occurs before send()
Request to send
Request to send
35Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 2.5 Using a message buffer.
Process 1 Process 2
send();
recv();
Message buffer
Readmessage buffer
Continueprocess
Time
36Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
bcast();
buf
bcast();
data
bcast();
datadata
Figure 2.6 Broadcast operation.
Process 0 Process n − 1Process 1
Action
Code
37Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
scatter();
buf
scatter();
data
scatter();
datadata
Figure 2.7 Scatter operation.
Process 0 Process n − 1Process 1
Action
Code
38Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 2.8 Gather operation.
gather();
buf
gather();
data
gather();
datadata
Process 0 Process n − 1Process 1
Action
Code
39Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 2.9 Reduce operation (addition).
reduce();
buf
reduce();
data
reduce();
datadata
Process 0 Process n − 1Process 1
+
Action
Code
40Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 2.10 Message passing between workstations using PVM.
PVM
Application
daemon
program
Workstation
PVMdaemon
Applicationprogram
Applicationprogram
PVMdaemon
Workstation
Workstation
Messagessent throughnetwork
(executable)
(executable)
(executable)
41Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 2.11 Multiple processes allocated to each processor (workstation).
Workstation
Applicationprogram
PVMdaemon Workstation
Workstation
Messagessent throughnetwork
(executable)
PVMdaemon
PVMdaemon
42Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 2.12 pvm_psend() and pvm_precv() system calls.
Process 1 Process 2
pvm_psend();
pvm_precv();Continueprocess
Wait for message
Pack
Send bufferArray Array toholdingdata
receivedata
43Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
pvm_pkint( … &x …);pvm_pkstr( … &s …);pvm_pkfloat( … &y …);pvm_send(process_2 … ); pvm_recv(process_1 …);
pvm_upkint( … &x …);pvm_upkstr( … &s …);pvm_upkfloat(… &y … );
Send
Receivebuffer
buffer
xsy
Process_1 Process_2
Figure 2.13 PVM packing messages, sending, and unpacking.
Message
pvm_initsend();
44Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 2.14 Sample PVM program.
#include <stdio.h>#include <stdlib.h>#include <pvm3.h>#define SLAVE “spsum”#define PROC 10#define NELEM 1000main() {
int mytid,tids[PROC];int n = NELEM, nproc = PROC;int no, i, who, msgtype;int data[NELEM],result[PROC],tot=0;char fn[255];FILE *fp;mytid=pvm_mytid();/*Enroll in PVM */
/* Start Slave Tasks */no= pvm_spawn(SLAVE,(char**)0,0,““,nproc,tids);if (no < nproc) {
printf(“Trouble spawning slaves \n”);for (i=0; i<no; i++) pvm_kill(tids[i]);pvm_exit(); exit(1);
}
/* Open Input File and Initialize Data */strcpy(fn,getenv(“HOME”));strcat(fn,”/pvm3/src/rand_data.txt”);if ((fp = fopen(fn,”r”)) == NULL) {
printf(“Can’t open input file %s\n”,fn);exit(1);
}for(i=0;i<n;i++)fscanf(fp,”%d”,&data[i]);
/* Broadcast data To slaves*/pvm_initsend(PvmDataDefault);msgtype = 0;pvm_pkint(&nproc, 1, 1);pvm_pkint(tids, nproc, 1);pvm_pkint(&n, 1, 1);pvm_pkint(data, n, 1);pvm_mcast(tids, nproc, msgtag);
/* Get results from Slaves*/msgtype = 5;for (i=0; i<nproc; i++){
pvm_recv(-1, msgtype);pvm_upkint(&who, 1, 1);pvm_upkint(&result[who], 1, 1);printf(“%d from %d\n”,result[who],who);
}
/* Compute global sum */for (i=0; i<nproc; i++) tot += result[i];printf (“The total is %d.\n\n”, tot);
pvm_exit(); /* Program finished. Exit PVM */ return(0);
#include <stdio.h>#include “pvm3.h”#define PROC 10#define NELEM 1000
main() {int mytid;int tids[PROC];int n, me, i, msgtype;int x, nproc, master;int data[NELEM], sum;
mytid = pvm_mytid();
/* Receive data from master */msgtype = 0;pvm_recv(-1, msgtype);pvm_upkint(&nproc, 1, 1);pvm_upkint(tids, nproc, 1);pvm_upkint(&n, 1, 1);pvm_upkint(data, n, 1);
/* Determine my tid */for (i=0; i<nproc; i++)
if(mytid==tids[i]){me = i;break;}
/* Add my portion Of data */x = n/nproc;low = me * x;high = low + x;for(i = low; i < high; i++)
sum += data[i];
/* Send result to master */pvm_initsend(PvmDataDefault);pvm_pkint(&me, 1, 1);pvm_pkint(&sum, 1, 1);msgtype = 5;master = pvm_parent();pvm_send(master, msgtype);
/* Exit PVM */pvm_exit();return(0);
}
Master
Slave
Broadcast data
Receive results
45Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 2.15 Unsafe message passing with libraries.
lib()
lib()
send(…,1,…);
recv(…,0,…);
Process 0 Process 1
send(…,1,…);
recv(…,0,…);
(a) Intended behavior
(b) Possible behavior
lib()
lib()
send(…,1,…);
recv(…,0,…);
Process 0 Process 1
send(…,1,…);
recv(…,0,…);
Destination
Source
46Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 2.16 Sample MPI program.
#include “mpi.h”#include <stdio.h>#include <math.h>#define MAXSIZE 1000
void main(int argc, char *argv){
int myid, numprocs;int data[MAXSIZE], i, x, low, high, myresult, result;char fn[255];char *fp;
MPI_Init(&argc,&argv);MPI_Comm_size(MPI_COMM_WORLD,&numprocs);MPI_Comm_rank(MPI_COMM_WORLD,&myid);
if (myid == 0) { /* Open input file and initialize data */strcpy(fn,getenv(“HOME”));strcat(fn,”/MPI/rand_data.txt”);if ((fp = fopen(fn,”r”)) == NULL) {
printf(“Can’t open the input file: %s\n\n”, fn);exit(1);
}for(i = 0; i < MAXSIZE; i++) fscanf(fp,”%d”, &data[i]);
}
/* broadcast data */MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD);
/* Add my portion Of data */x = n/nproc;low = myid * x;high = low + x;for(i = low; i < high; i++)
myresult += data[i];printf(“I got %d from %d\n”, myresult, myid);
/* Compute global sum */MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);if (myid == 0) printf(“The sum is %d.\n”, result);
MPI_Finalize();}
47Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Tim
e
Number of data items (n)
Startup time
Figure 2.17 Theoretical communicationtime.
48Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
160
140
120
100
80
60
40
20
01 2 3 4 50
x0
c2g(x) = 6x2
c1g(x) = 2x2
f(x) = 4x2 + 2x + 12
Figure 2.18 Growth of function f(x) = 4x2 + 2x + 12.
49Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 2.19 Broadcast in a three-dimensional hypercube.
000 001
010 011
100
110
101
111
1st step
2nd step
3rd step
50Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 2.20 Broadcast as a tree construction.
P000
P000
P010P000
P000
P010P100 P110 P001 P101 P011
P001
P111
P001 P011
Step 1
Step 2
Step 3
Message
51Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 2.21 Broadcast in a mesh.
1 2 3
4
5 6
2
3
4
5
6
3 4
5
4
Steps
52Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Source Destinations
Message
Figure 2.22 Broadcast on an Ethernetnetwork.
53Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 2.23 1-to-N fan-out broadcast.
Source
N destinations
Sequential
54Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Source
Sequential message issue
DestinationsFigure 2.24 1-to-N fan-out broadcast on atree structure.
55Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Process 1
Process 2
Process 3
Time
Computing
Waiting
Message-passing system routine
Message
Figure 2.25 Space-time diagram of a parallel program.
56Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Statement number or regions of program1 2 3 4 5 6 7 8 9 10
Num
ber
of r
epet
ition
s or
tim
e
Figure 2.26 Program profile.
57Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Processes
Results
Input data
Figure 3.1 Disconnected computationalgraph (embarrassingly parallel problem).
58Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 3.2 Practical embarrassingly parallel computational graph with dynamic processcreation and the master-slave approach.
Send initial data
Collect results
MasterSlaves
spawn()
recv()
send()
recv()send()
59Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
640
480
80
80
640
480
10
(a) Square region for each process
(b) Row region for each process
Figure 3.3 Partitioning into regions for individual processes.
Process
Map
Process
Map
x
y
60Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Real
Figure 3.4 Mandelbrot set.
+2−2 0
+2
−2
0
Imaginary
61Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Work pool
(xc, yc)(xa, ya)
(xd, yd)(xb, yb)
(xe, ye)
Figure 3.5 Work pool approach.
Task
Return results/request new task
62Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
0 disp_height
Row returned
Row sent
Increment
Decrement
Rows outstanding in slaves (count)
Figure 3.6 Counter termination.Terminate
63Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 3.7 Computing π by a Monte Carlomethod.
Area = π
Total area = 4
2
2
64Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
x
y 1 x2–=1
f(x)
Figure 3.8 Function being integrated incomputing π by a Monte Carlo method.1
1
65Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Master
Slaves
Random numberprocess
Randomnumber
Partial sum
Request
Figure 3.9 Parallel Monte Carlointegration.
66Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
x1 x2 xk-1 xk xk+1 xk+2 x2k-1 x2k
Figure 3.10 Parallel computation of a sequence.
67Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 4.1 Partitioning a sequence of numbers into parts and adding the parts.
Sum
x0 … x(n/m)−1 xn/m … x(2n/m)−1 x(m−1)n/m … xn−1…
Partial sums
+ +
+
+
68Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 4.2 Tree construction.
Initial problem
Divide
Final tasks
problem
69Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 4.3 Dividing a list into parts.
P0 P1 P2 P3 P4 P5 P6 P7
P0
P0
P0 P2 P4 P6
P4
Original list
x0 xn−1
70Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 4.4 Partial summation.
P0 P1 P2 P3 P4 P5 P6 P7
P0
P0
P0 P2 P4 P6
P4
Final sum
x0 xn−1
71Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
OR
OROR
Found/Not found
Figure 4.5 Part of a search tree.
72Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 4.6 Quadtree.
73Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Image area
First division
Second division
into four parts
Figure 4.7 Dividing an image.
74Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Unsorted numbers
Sorted numbers
Buckets
Figure 4.8 Bucket sort.
Sortcontentsof buckets
Merge lists
75Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Unsorted numbers
Sort
Figure 4.9 One parallel version of bucket sort.
Buckets
contentsof buckets
Merge lists
p processors
Sorted numbers
76Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Unsorted numbers
Sort
Large
Figure 4.10 Parallel version of bucket sort.
Smallbuckets
Emptysmallbuckets
buckets
contentsof buckets
Merge lists
p processors
n/m numbers
Sorted numbers
77Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Send Receive
Send
Process 1 Process n − 1
Process 0 Process n − 1
Process 0 Process n − 2
0 n − 1 0 n − 1 0 n − 1 0 n − 1
Figure 4.11 “All-to-all” broadcast.
buffer buffer
buffer
78Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
A0,0 A0,1 A0,2 A0,3
A1,0 A1,1 A1,2 A1,3
A3,0 A3,1 A3,2 A3,3
A2,0 A2,1 A2,2 A2,3
A0,0 A1,0 A2,0 A3,0
A0,1 A1,1 A2,1 A3,1
A0,3 A1,3 A2,3 A3,3
A0,2 A1,2 A2,2 A3,2
P0
P1
P2
P3
“All-to-all”
Figure 4.12 Effect of “all-to-all” on anarray.
79Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 4.13 Numerical integration usingrectangles.
f(q)f(p)
δ
f(x)
xp qa b
80Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
f(q)f(p)
δFigure 4.14 More accurate numericalintegration using rectangles.
f(x)
xp qa b
81Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 4.15 Numerical integration usingthe trapezoidal method.
f(q)f(p)
δ
f(x)
xp qa b
82Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 4.16 Adaptive quadratureconstruction.
A B
Cf(x)
x
83Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 4.17 Adaptive quadrature with falsetermination.
f(x)
x
A B
C = 0
84Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Distant cluster of bodiesr
Center of mass
Figure 4.18 Clustering distant bodies.
85Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Subdivisiondirection
Figure 4.19 Recursive division of two-dimensional space.
Partial quadtreeParticles
86Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 4.20 Orthogonal recursive bisectionmethod.
87Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Binary Tree
Result
Figure 4.21 Process diagram for Problem 4-12(b).
log n numbers
88Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
f(a)
f(b)
ab
y
x
f(x)
Figure 4.22 Bisection method for findingthe zero crossing location of a function.
89Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 4.23 Convex hull (Problem 4-22).
90Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0 P1 P2 P3 P4 P5
Figure 5.1 Pipelined processes.
91Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
sum
a[0] a[1] a[2] a[3] a[4]
soutsin
Figure 5.2 Pipeline for an unfolded loop.
soutsin soutsin soutsin soutsin
a aaaa
92Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
f(t) foutfin
Figure 5.3 Pipeline for a frequency filter.
foutfin foutfin foutfin foutfin
f0 f4f3f2f1 Filtered signal
Signal withoutfrequency f0
Signal withoutfrequency f1
Signal withoutfrequency f2
Signal withoutfrequency f3
93Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0
P4
P3
P5
P2
P1Instance
1
Instance1
Instance1
Instance1
Instance1
Instance1
Instance2
Instance2
Instance2
Instance2
Instance2
Instance2
Instance4
Instance3
Instance3
Instance3
Instance3
Instance3
Instance3
Instance4
Instance4
Instance4
Instance4
Instance4
Instance5
Instance5
Instance5
Instance5
Instance5
Instance6
Instance5
Instance6
Instance6
Instance6
Instance6
Instance7
Instance7
Instance7
Instance7
Time
Figure 5.4 Space-time diagram of a pipeline.
p − 1 m
94Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0 P1 P2 P3 P4 P5
P0 P1 P2 P3 P4 P5
P0 P1 P2 P3 P4 P5
P0 P1 P2 P3 P4 P5
P0 P1 P2 P3 P4 P5
Time
Instance 1
Instance 2
Instance 3
Instance 0
Instance 4
Figure 5.5 Alternative space-time diagram.
95Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0
P4
P3
P5
P2
P1
Time
Figure 5.6 Pipeline processing 10 data elements.
d9d8d7d6d5d4d3d2d1d0 P0 P1 P2 P3 P4 P5
(a) Pipeline structure
(b) Timing diagram
P8
P7
P9
P6
d0 d1 d2 d3 d4 d5 d6 d7 d8 d9
P7P6 P8 P9
Input sequence
p − 1 n
d0 d1 d2 d3 d4 d5 d6 d7 d8 d9
d0 d1 d2 d3 d4 d5 d6 d7 d8 d9
d0 d1 d2 d3 d4 d5 d6 d7 d8 d9
d0 d1 d2 d3 d4 d5 d6 d7 d8 d9
d0 d1 d2 d3 d4 d5 d6 d7 d8 d9
d0 d1 d2 d3 d4 d5 d6 d7 d8 d9
d0 d1 d2 d3 d4 d5 d6 d7 d8
d0 d1 d2 d3 d4 d5 d6 d7
d0 d1 d2 d3 d4 d5 d6
96Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Time
P0
P1
P2
P3
P4
P5
(a) Processes with the same (b) Processes not with the
P0
P1
P2
P3
P4
P5
Time
Figure 5.7 Pipeline processing where information passes to next stage before end of process.
Informationtransfersufficient tostart nextprocess
same execution timeexecution time
Information passedto next stage
97Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0 P1 P2 P3 P4 P5 P7P6 P8 P9 P11P10
Processor 1Processor 0 Processor 2
Figure 5.8 Partitioning processes onto processors.
98Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Host
Multiprocessor
computer
Figure 5.9 Multiprocessor system with a line configuration.
99Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0 P3P2P1 P4
Figure 5.10 Pipelined addition.
Σ1
5iΣ
1
i Σ1
2i Σ
1
3i Σ
1
4i
100Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0 P2P1 Pn−1
Figure 5.11 Pipelined addition numbers with a master process and ring configuration.
dn−1… d2d1d0
Master process
Sum
Slaves
101Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0 P2P1 Pn−1
Figure 5.12 Pipelined addition of numbers with direct access to slave processes.
Master process
Sum
Slaves dn−1d0 d1
Numbers
102Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
5
4, 3, 1, 2, 5
4, 3, 1, 2
4, 3, 12
5
4, 3 5 21
4 53
21
54
32
5 43
5 4
1
21
32
1
5 4 3 21
5 4 3 2 1
Figure 5.13 Steps in insertion sort with five numbers.
P0 P2 P3 P4P1
Time
1
2
3
4
5
6
8
7
(cycles)
9
10
103Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0 P1 P2
Largest number Next largestnumber
Series of numbersxn−1 … x1x0
Figure 5.14 Pipeline for sorting using insertion sort.
xmax
Compare
Smallernumbers
104Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0 P2P1 Pn−1
Figure 5.15 Insertion sort with results returned to the master process using a bidirectional line configuration.
dn−1… d2d1d0Sorted sequence
Master process
105Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0
P4
P3
P2
P1
Time
Figure 5.16 Insertion sort with results returned.
Sorting phase Returning sorted numbers
2n − 1 n
Shown for n = 5
106Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0 P1 P2
1st prime 2nd prime
Series of numbersxn−1 … x1x0
Figure 5.17 Pipeline for sieve of Eratosthenes.
Compare
Not multiples of
3rd primemultiples number number number
1st prime number
107Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
x0x0
x0 x0
x1x1 x1x2 x2
x3
Figure 5.18 Solving an upper triangular set of linear equation using a pipeline.
Compute x0 Compute x1 Compute x2 Compute x3
P0 P1 P2 P3
108Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Time
P0
P1
P2
P3
P4
P5
Figure 5.19 Pipeline processing using backsubstitution.
ProcessesFinal computed value
First value passed onward
109Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0 P1 P2 P3 P4
dividesend(x0) ⇒ recv(x0)end send(x0) ⇒ recv(x0)
multiply/add send(x0) ⇒ recv(x0)divide/subtract multiply/add send(x0) ⇒ recv(x0)send(x1) ⇒ recv(x1) multiply/add send(x1) ⇒end send(x1) ⇒ recv(x1) multiply/add
multiply/add send(x1) ⇒ recv(x1)divide/subtract multiply/add send(x1) ⇒send(x2) ⇒ recv(x2) multiply/addend send(x2) ⇒ recv(x2)
multiply/add send(x2) ⇒divide/subtract multiply/addsend(x3) ⇒ recv(x3)end send(x3) ⇒
multiply/adddivide/subtractsend(x4) ⇒end
Figure 5.20 Operations in back substitution pipeline.
Time
110Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
x1 x2
x
x3 x4
yin yout
a
x
yin yout
a
x
yin yout
a
x
yin yout
a
a1 a2 a3 a4
y4y3y2y1 Output
Figure 5.21 Pipeline for Problem 5-9.
111Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Display
Pipeline
Audio input(digitized)
Figure 5.22 Audio histogram display.
Display
Audio input(digitized)
(a) Pipeline solution (b) Direct decomposition
112Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0 P1 P2 Pn−1
Processes
Barrier
Figure 6.1 Processes reaching the barrier atdifferent times.
Time
Active
Waiting
113Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0
Processes
Figure 6.2 Library call barriers.
Barrier();
P1
Barrier();
Pn−1
Barrier();
Processes wait untilall reach theirbarrier call
114Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0
Processes
Figure 6.3 Barrier using a centralized counter.
Barrier();
P1
Barrier();
Pn−1
Barrier();
Counter, C
Incrementand check for n
115Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
for(i=0;i<n;i++)recv(Pany);
for(i=0;i<n;i++)send(Pi);
Master
Figure 6.4 Barrier implementation in a message-passing system.
ArrivalphaseDeparturephase
send(Pmaster);recv(Pmaster);
Barrier:
send(Pmaster);recv(Pmaster);
Barrier:
Slave processes
116Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0 P1 P2 P3 P4 P5 P6 P7
Arrivalat barrier
Departurefrom barrier
Figure 6.5 Tree barrier.
Sychronizingmessage
117Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
1st stage
2nd stage
3rd stage
P0 P1 P2 P3 P4 P5 P6 P7
Time
Figure 6.6 Butterfly construction.
118Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
a[0]=a[0]+k; a[n-1]=a[n-1]+k;a[1]=a[1]+k;
Instructiona[] = a[] + k;
a[0] a[n-1]a[1]
Figure 6.7 Data parallel computation.
Processors
119Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Σi=0
0
Σi=0
1
Σi=0
2
Σi=0
3
Σi=0
4
Σi=0
5
Σi=0
6
Σi=0
7
Σi=0
8
Σi=0
9
Σi=0
10
Σi=0
11
Σi=0
12
Σi=0
15
Σi=0
14
Σi=0
13
Σi=0
0
Σi=0
1
Σi=1
2
Σi=2
3
Σi=3
4
Σi=4
5
Σi=5
6
Σi=6
7
Σi=7
8
Σi=8
9
Σi=9
10
Σi=10
11
Σi=11
12
Σi=14
15
Σi=13
14
Σi=12
13
Σi=0
0
Σi=0
1
Σi=0
2
Σi=0
3
Σi=1
4
Σi=2
5
Σi=3
6
Σi=4
7
Σi=5
8
Σi=6
9
Σi=7
10
Σi=8
11
Σi=9
12
Σi=12
15
Σi=11
14
Σi=10
13
Σi=0
0
Σi=0
1
Σi=0
2
Σi=0
3
Σi=0
4
Σi=0
5
Σi=0
6
Σi=0
7
Σi=1
8
Σi=2
9
Σi=3
10
Σi=4
11
Σi=5
12
Σi=8
15
Σi=7
14
Σi=6
13
x0 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15
Figure 6.8 Data parallel prefix sum operation.
Numbers
Step 1
Step 2
Step 3
Final step
Add
Add
Add
Add
(j = 0)
(j = 1)
(j = 2)
(j = 3)
120Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Computedvalue
Error
Iteration
Exact value
Figure 6.9 Convergence rate.t+1t
121Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 6.10 Allgather operation.
Allgather(); Allgather();
data
Allgather();
datadata
Process 0 Process n − 1Process 1
Send
Receive
buffer
buffer
xn−1x0 x1
122Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
3228242016128400
1 × 106
2 × 106
Figure 6.11 Effects of computation and communication in Jacobi iteration.
OverallCommunication
Computation
Execution
Number of processors, p
time(τ = 1)
123Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
hi,j
hi−1,j
hi,j−1 hi,j+1
hi+1,j
j
iMetal plate
Figure 6.12 Heat distribution problem.
Enlarged
124Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
xi−1
xi
xi+1
xi+k
xi−k
x1 x2 xk−1
xk+1 xk+2
xk
x2k−1
xk2
x2k
Figure 6.13 Natural ordering of heatdistribution problem.
125Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
send(g, Pi-1,j);send(g, Pi+1,j);send(g, Pi,j-1);send(g, Pi,j+1);recv(w, Pi-1,j)recv(x, Pi+1,j);recv(y, Pi,j-1);recv(z, Pi,j+1);
send(g, Pi-1,j);send(g, Pi+1,j);send(g, Pi,j-1);send(g, Pi,j+1);recv(w, Pi-1,j)recv(x, Pi+1,j);recv(y, Pi,j-1);recv(z, Pi,j+1);
send(g, Pi-1,j);send(g, Pi+1,j);send(g, Pi,j-1);send(g, Pi,j+1);recv(w, Pi-1,j)recv(x, Pi+1,j);recv(y, Pi,j-1);recv(z, Pi,j+1);
send(g, Pi-1,j);send(g, Pi+1,j);send(g, Pi,j-1);send(g, Pi,j+1);recv(w, Pi-1,j)recv(x, Pi+1,j);recv(y, Pi,j-1);recv(z, Pi,j+1);
send(g, Pi-1,j);send(g, Pi+1,j);send(g, Pi,j-1);send(g, Pi,j+1);recv(w, Pi-1,j)recv(x, Pi+1,j);recv(y, Pi,j-1);recv(z, Pi,j+1);
Figure 6.14 Message passing for heat distribution problem.
i
j
column
row
126Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0
P1
P1
P0
Pp−1
Pp−1
Figure 6.15 Partitioning heat distribution problem.
Blocks Strips (columns)
127Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Square blocks
Strips
n
np---
Figure 6.16 Communication consequences of partitioning.
128Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
10001001010
1000
2000
Strip partition best
Block partition best
tstartup
Processors, pFigure 6.17 Startup times for block andstrip partitions.
129Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Ghost points
Process i
Process i+1
One rowof points
Array heldby process i
Array heldby process i+1
Figure 6.18 Configurating array into contiguous rows for each process, with ghost points.
Copy
130Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
20°C100°C
10ft
10ft
4ft
Figure 6.19 Room for Problem 6-14.
131Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
vehicle
Figure 6.20 Road junction forProblem 6-16.
132Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Airflow
Figure 6.21 Figure for Problem 6-23.
Actual dimensionsselected at will
133Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P4
P5
P0
P1
P2
P3
P4
P5
P2P1P0
P3
Time
(b) Perfect load balancing
(a) Imperfect load balancing leading
t
Figure 7.1 Load balancing.
to increased execution time
Processors
Processors
134Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
QueueWork pool
Slave “worker” processes
Masterprocess
Figure 7.2 Centralized work pool.
Tasks
Request task
Send task
(and possiblysubmit new tasks)
135Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Process M0 Process Mn−1
Master, Pmaster
Slaves
Initial tasks
Figure 7.3 A distributed work pool.
136Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Process
Requests/tasks
ProcessProcess
Process
Figure 7.4 Decentralized work pool.
137Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 7.5 Decentralized selection algorithm requesting tasks between slaves.
RequestsSlave Pi
Localselectionalgorithm
RequestsSlave Pj
Localselectionalgorithm
138Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Masterprocess
P1 P2 P3 Pn−1
P0
Figure 7.6 Load balancing using a pipeline structure.
139Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
If buffer empty,make request
Receive taskfrom request
If free,requesttask
Receivetask fromrequest
If buffer full,send task
Request for task
Figure 7.7 Using a communication process in line load balancing.
Ptask
Pcomm
140Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0
P1
P3
P2
P6P4P5
Figure 7.8 Load balancing using a tree.
Taskwhenrequested
141Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Inactive
Active
Parent
First task
Other processes
Finalacknowledgment
Process
TaskAcknowledgment
Figure 7.9 Termination using messageacknowledgments.
142Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0 P2P1 Pn−1
Token passed to next processor
Figure 7.10 Ring termination detection algorithm.
when reached local termination condition
143Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Terminated
Token
AND
Figure 7.11 Process algorithm for localtermination.
144Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0 PiPj Pn−1
Figure 7.12 Passing task to previous processes.
Task
145Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Terminated
AND
Terminated
AND Terminated
AND
Figure 7.13 Tree termination.
146Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Base camp
Summit
Possible intermediate camps
B
C
A
Figure 7.14 Climbing a mountain.
F
E
D
147Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 7.15 Graph of mountain climb.
A B C
D
E
F
10
13
17
51
8
24
9
14
148Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
A
B
C
D
E
F
A B C D E F
∞
∞
∞
∞
∞
∞
10
13
17
518 24
9
∞
∞
∞ ∞ ∞ ∞ ∞
∞
∞ ∞
∞∞
∞
∞
∞
∞
∞ ∞ ∞ ∞
∞
∞14Source
Destination
A
B
C
D
E
F
Source
Weight NULL
10
8 13 24 51C D E F
14D
9E
17F
(a) Adjacency matrix
(b) Adjacency list
Figure 7.16 Representing a graph.
B
149Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Vertex i
Vertex j
wi , j
dj
di
Figure 7.17 Moore’s shortest-path algo-rithm.
150Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Start at
w[]
dist Process C
Process A
Master process
Figure 7.18 Distributed graph search.
Vertex
sourcevertex
w[]
dist
Vertex
dist
Process B
Newdistance
Newdistance
w[]Vertex
Other processes
151Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Entrance
Exit
Search path
Figure 7.19 Sample maze for Problem 7-9.
152Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Gold
Entrance
Figure 7.20 Plan of rooms for Problem 7-10.
153Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Door
Room A
Room B
Figure 7.21 Graph representation forProblem 7-10.
154Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Processors Memory modulesFigure 8.1 Shared memory multiprocessorusing a single bus.
Bus
Cache
155Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
a. Brinch Hansen, P. (1975), “The Programming Language Concurrent Pascal,” IEEE Trans. Software Eng.,Vol. 1, No. 2 (June), pp. 199–207.
b. U.S. Department of Defense (1981), “The Programming Language Ada Reference Manual,” LectureNotes in Computer Science, No. 106, Springer-Verlag, Berlin.
c. Bräunl, T., R. Norz (1992), Modula-P User Manual, Computer Science Report, No. 5/92 (August), Univ.Stuttgart, Germany.
d. Thinking Machines Corp. (1990), C* Programming Guide, Version 6, Thinking Machines System Docu-mentation.
e. Gehani, N., and W. D. Roome (1989), The Concurrent C Programming Language, Silicon Press, NewJersey.
f. Fox, G., S. Hiranandani, K. Kennedy, C. Koelbel, U. Kremer, C. Tseng, and M. Wu (1990), Fortran DLanguage Specification, Technical Report TR90-141, Dept. of Computer Science, Rice University.
TABLE 8.1 SOME EARLY PARALLEL PROGRAMMING LANGUAGES
Language Originator/date Comments
Concurrent Pascal Brinch Hansen, 1975a Extension to Pascal
Ada U.S. Dept. of Defense, 1979b Completely new language
Modula-P Bräunl, 1986c Extension to Modula 2
C* Thinking Machines, 1987d Extension to C for SIMD systems
Concurrent C Gehani and Roome, 1989e Extension to C
Fortran D Fox et al., 1990f Extension to Fortran for data parallel programming
156Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 8.2 FORK-JOIN construct.
Main program
FORK
FORK
FORK
JOIN
JOIN JOIN
JOIN
Spawned processes
157Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
IP
Stack
Code Heap
Files
Interrupt routines
Code Heap
Files
Interrupt routines
IP
Stack
IP
Stack
Thread
Thread
(a) Process
(b) ThreadsFigure 8.3 Differences between a processand threads.
158Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Main program
pthread_create(&thread1, NULL, proc1, &arg);
pthread_join(thread1, *status);
proc1(&arg)
return(*status);
{
}
Figure 8.4 pthread_create() and pthread_join().
thread1
159Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Main program
Figure 8.5 Detached threads.
Thread
pthread_create();
pthread_create();
pthread_create(); Termination
Thread
Thread
Termination
Termination
160Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
+1 +1
Shared variable, x
Read
Write Write
Read
Process 1 Process 2Figure 8.6 Conflict in accessing sharedvariable.
161Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Process 1 Process 2
while (lock == 1) do_nothing;lock = 1;
Critical section
lock = 0;
while (lock == 1)do_nothing;
lock = 1;
Critical section
lock = 0;
Figure 8.7 Control of critical sections through busy waiting.
162Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 8.8 Deadlock (deadly embrace).
R1 R2
R1 R2 Rn −1 Rn
P1 P2
P1 P2 Pn −1 Pn
(a) Two-process deadlock
(b) n-process deadlock
Resource
Process
163Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Block
Cache
Processor 1
Cache
Processor 2
Main memory
Block in cache
76543210
Addresstag
Figure 8.9 False sharing in caches.
164Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Array a[]sum
addr
Figure 8.10 Shared memory locations for Section 8.4.1 program example.
165Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Array a[]global_index
addr
sum
Figure 8.11 Shared memory locations for Section 8.4.2 program example.
166Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Test1Test2
Test3
Output1
Output2
1 2
3Figure 8.12 Sample logic circuit.
TABLE 8.2 LOGIC CIRCUIT DESCRIPTION FOR FIGURE 8.12
Gate Function Input 1 Input 2 Output
1 AND Test1 Test2 Gate1
2 NOT Gate1 Output1
3 OR Test3 Gate1 Output2
167Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Movement
River
Log
of logs
Figure 8.13 River and frog for Problem 8-23.
Frog
168Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Master
Slaves
Pool of threads
Request
Signal
Requestserviced
Figure 8.14 Thread pool for Problem 8-24.
169Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
a[i] a[0] a[i] a[n-1]
Incrementcounter, x
b[x] = a[i] Figure 9.1 Finding the rank in parallel.
Compare
170Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
a[i] a[0] a[i] a[1] a[i] a[2] a[i] a[3]
Tree
Add
0/1 0/10/1 0/1
Add
0/1/2 0/1/2
Add
Figure 9.2 Parallelizing the rank computation.
0/1/2/3/4
Compare
171Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 9.3 Rank sort using a master andslaves.
a[] b[]
Slaves
Master
Readnumbers
Place selectednumber
172Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
A
P1
Compare
B
P2
Send(A)
If A > B send(B)
Figure 9.4 Compare and exchange on a message-passing system — Version 1.
If A > B load Aelse load B
else send(A)
1
3
2
Sequence of steps
173Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Compare
A
P1
Compare
B
P2
Send(A)
Send(B)
Figure 9.5 Compare and exchange on a message-passing system — Version 2.
If A > B load AIf A > B load B
1
3
2
3
174Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
43422825
88502825
Returnlowernumbers
98804342
88502825
43422825
98888050
Merge
Keephighernumbers
Figure 9.6 Merging two sublists — Version 1.
Originalnumbers
Finalnumbers
P1 P2
175Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
88502825
98804342
43422825
98888050
Merge
Keeplowernumbers
88502825
98804342
43422825
98888050
Merge
Keephighernumbers
Figure 9.7 Merging two sublists — Version 2.
P1 P2
Originalnumbers
Originalnumbers
(final
(finalnumbers)
numbers)
176Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Time
4 2 7 8 5 1 3 6
2 4 7 8 5 1 3 6
2 4 7 8 5 1 3 6
2 4 7 8 5 1 3 6
2 4 7 5 8 1 3 6
2 4 7 5 1 8 3 6
2 4 7 5 1 3 8 6
2 4 7 5 1 3 6 8
2 4 7 5 1 3 6 8
2 4 7 5 1 3 6 8
2 4 5 7 1 3 6 8
2 4 5 1 7 3 6 8
2 4 5 1 3 7 6 8
2 4 5 1 3 6 7 8
2 4 5 1 3 6 7 8
Figure 9.8 Steps in bubble sort.
Original
Phase 1
Phase 2
Phase 3
sequence: 4 2 7 8 5 1 3 6
Placelargestnumber
Placenextlargestnumber
177Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
1
1
1
12
2
3 2 1
Time
Figure 9.9 Overlapping bubble sort actions in a pipeline.
Phase 3
Phase 2
Phase 1
3 2 1
Phase 4
4 3 2 1
178Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
4 2 7 5 1 68 3
2 4 7 1 5 68 3
2 4 7 8 3 61 5
2 4 1 3 8 67 5
2 1 4 7 5 63 8
1 2 3 5 7 84 6
1 2 3 5 6 84 7
1 2 3 5 6 84 7
Step
1
2
3
4
5
6
7
0
Figure 9.10 Odd-even transposition sort sorting eight numbers.
P0 P1 P2 P3 P4 P5 P6 P7
Time
179Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Smallest
Largest
number
number Figure 9.11 Snakelike sorted list.
180Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
4 14 8 2
10 3 13 16
7 15 1 5
12 6 11 9
2 4 8 14
16 13 10 3
1 5 7 15
12 11 9 6
1 4 7 3
2 5 8 6
12 11 9 14
16 13 10 15
1 3 4 7
8 6 5 2
9 11 12 14
16 15 13 10
1 3 4 2
8 6 5 7
9 11 12 10
16 15 13 14
1 2 3 4
8 7 6 5
9 10 11 12
16 15 14 13
(a) Original placement
Figure 9.12 Shearsort.
(b) Phase 1 — Row sort (c) Phase 2 — Column sort
(d) Phase 3 — Row sort (e) Phase 4 — Column sort (f) Final phase — Row sort
of numbers
181Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
(b) Transpose operation(a) Operations between elementsin rows
(c) Operations between elementsin rows (originally columns)
Figure 9.13 Using the transpose operation to maintain operations in rows.
182Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
4 2 6
4 2 7 8 5 1 3 6
4 2 7 8 5 1 3 6
7 8 5 1 3
4 2 67 8 5 1 3
2 4 6
1 2 3 4 5 6 7 8
2 4 7 8 1 3 5 6
7 8 1 5 3
Sorted list
Unsorted list
Figure 9.14 Mergesort using tree allocation of processes.
Merge
Dividelist
P0
P2P0
P4 P5 P6 P7P1 P2 P3P0
P0
P6P4
P4
P0
P2P0
P0
P6P4
P4
Process allocation
183Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P4
P6P1P0
2 1 6
4 2 7 8 5 1 3 6
3 2 1 4 5 7 8 6
3 4 5 7 8
1 2 7 86
Sorted list
Unsorted list
Figure 9.15 Quicksort using tree allocation of processes.
P0
P0
P7
P0
P6
P4
Process allocation
Pivot
3
P2
184Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
862 6
1 2 6
4 2 7 8 5 1 3 6
3 2 1 5 7 8 6
7 8
Sorted list
Unsorted list
Figure 9.16 Quicksort showing pivot withheld in processes.
4
1
82
3
7
5
Pivots
Pivot
185Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Work pool
Sublists
Slave processes
Requestsublist Return
sublistFigure 9.17 Work pool implementation ofquicksort.
186Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
(a) Phase 1 001 010 011 100 101 110 111000
001 010 011 100 101 110 111000(b) Phase 2
≤ p1 > p1
001 010 011 100 101 110 111000(c) Phase 3
> p2 > p3≤ p3≤ p2
> p6 > p7≤ p7≤ p6> p4 > p5≤ p5≤ p4
Figure 9.18 Hypercube quicksort algorithm when the numbers are originally in node 000.
187Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
(a) Phase 1
Broadcast pivot, p1
001 010 011 100 101 110 111000
001 010 011 100 101 110 111000(b) Phase 2
≤ p1 > p1
Broadcast pivot, p3Broadcast pivot, p2
001 010 011 100 101 110 111000(c) Phase 3
Broadcastpivot, p4
Broadcastpivot, p5
Broadcastpivot, p6
Broadcastpivot, p7
> p2 > p3≤ p3≤ p2
> p6 > p7≤ p7≤ p6> p4 > p5≤ p5≤ p4
Figure 9.19 Hypercube quicksort algorithm when numbers are distributed among nodes.
188Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
(a) Phase 1 communication
(b) Phase 2 communication
(c) Phase 3 communication
Figure 9.20 Hypercube quicksortcommunication.
000 001
101
010 011
110 111
100
000 001
101
010 011
110 111
100
000 001
101
010 011
110 111
100
189Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
(a) Phase 1
Broadcast pivot, p1
001 011 010 110 111 101 100000
001 011 010 110 111 101 100000(b) Phase 2
≤ p1 > p1
Broadcast pivot, p3Broadcast pivot, p2
001 011 010 110 111 101 100000(c) Phase 3
Broadcastpivot, p4
Broadcastpivot, p5
Broadcastpivot, p6
Broadcastpivot, p7
> p2 > p3≤ p3≤ p2
> p6 > p7≤ p7≤ p6> p4 > p5≤ p5≤ p4
Figure 9.21 Quicksort hypercube algorithm with Gray code ordering.
190Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
82 4 5 1 6 73
83 4 761 2 5
Odd indicesEven indices
Sorted lists
a[] b[]
c[] d[]
e[]Final sorted list
Compare and exchange
1 2 3 4 5 6 7 8Figure 9.22 Odd-even merging of twosorted lists.
Merge
Merge
191Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
a2
b2
a4
b4
a3
b3
a1
b1
bn
anan−1
bn−1
Evenmergesort
Oddmergesort
c1c2c3c4
c2nc2n−1
Compare andexchange
Figure 9.23 Odd-even mergesort.
c5
c7c6
c2n−2
192Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
a0, a1, a2, a3, … an−2, an−1
Figure 9.24 Bitonic sequences.
Value
a0, a1, a2, a3, … an−2, an−1
(a) Single maximum (b) Single maximum and single minimum
193Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
3 5 8 9 7 4 2 1
3 4 2 1 7 5 8 9
Bitonic sequence
Bitonic sequence Bitonic sequence
Compare andexchange
Figure 9.25 Creating two bitonicsequences from one bitonic sequence.
194Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
3 5 8 9 7 4 2 1
3 4 2 1 7 5 8 9
Compare andexchange
2 1 3 4 7 5 8 9
1 2 3 4 5 7 8 9Sorted list Figure 9.26 Sorting a bitonic sequence.
Unsorted numbers
195Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Sorted list
Figure 9.27 Bitonic mergesort.
Unsorted numbers
Bitonicsortingoperation
Directionof increasingnumbers
196Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
8 3 4 7 9 2 1 5
3 8 7 4 2 9 5 1
3 4 7 8 5 9 2 1
3 4 7 8 9 5 2 1
3 4 2 1 9 5 7 8
2 1 3 4 7 5 9 8
1 2 3 4 5 7 8 9
1
2
3
4
5
6
Compare and exchangeai with ai+n/2 (n numbers)
n = 2 ai with ai+1
n = 4 ai with ai+2
Formbitonic listsof four
Formbitonic listof eight
numbers
numbers
Split
Sort
n = 2 ai with ai+1
Sort bitonic list
n = 8 ai with ai+4
n = 4 ai with ai+2
n = 2 ai with ai+1
Split
Split
Sort
Step
Figure 9.28 Bitonic mergesort on eight numbers.
Compare andexchange
HigherLower
= bitonic list[Fig. 9.24 (a) or (b)]
197Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
88502825
98804342
43422825
98888050
50422825
98888043
Figure 9.29 Compare-and-exchangealgorithm for Problem 9-5.
Step 1
Step 2
Step 3
Terminates when insertions at top/bottom of lists
198Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
a0,0 a0,1
a1,0
a0,m−2
an−1,0
a0,m−1
an−2,0
an−1,m−1an−1,m−2
an−2,m−1
a1,1 a1,m−2 a1,m−1
an−2,1 an−2,m-2
an−1,1
Row
Column
Figure 10.1 An n × m matrix.
199Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
× =A B C
Figure 10.2 Matrix multiplication, C = A × B.
i
j
ci,j
Row
ColumnMultiply Sum
results
200Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
× =A b c
Figure 10.3 Matrix-vector multiplicationc = A × b.
i ci
Rowsum
201Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
× =
Sum
A B C
Figure 10.4 Block matrix multiplication.
p
qMultiply results
202Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
a0,0 a0,1 a0,2 a0,3
a1,0
a2,0
a3,0
a1,2a1,1
a2,1
a3,1
a2,2
a3,2 a3,3
a1,3
a2,3
b0,0 b0,1 b0,2 b0,3
b1,0
b2,0
b3,0
b1,2b1,1
b2,1
b3,1
b2,2
b3,2 b3,3
b1,3
b2,3
a0,0 a0,1
a1,0 a1,1
b0,0 b0,1
b1,0 b1,1
a0,2 a0,3
a1,2 a1,3
b2,0 b2,1
b3,0 b3,1
(a) Matrices
(b) Multiplying A0,0 × B0,0 to obtain C0,0
a0,0b0,0+a0,1b1,0 a0,0b0,1+a0,1b1,1
a1,0b0,0+a1,1b1,0 a1,0b0,1+a1,1b1,1
A0,0 B0,0 A0,1 B1,0
a0,2b2,0+a0,3b3,0 a0,2b2,1+a0,3b3,1
a1,2b2,0+a1,3b3,0 a1,2b2,1+a1,3b3,1
+
× + ×
=
=
a0,0b0,0+a0,1b1,0+a0,2b2,0+a0,3b3,0 a0,0b0,1+a0,1b1,1+a0,2b2,1+a0,3b3,1
a1,0b0,0+a1,1b1,0+a1,2b2,0+a1,3b3,0 a1,0b0,1+a1,1b1,1+a1,2b2,1+a1,3b3,1
= C0,0
Figure 10.5 Submatrix multiplication.
×
203Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
b[][j]
a[i][]Row i
Column j
c[i][j]
Processor Pi,j
Figure 10.6 Direct implementation ofmatrix multiplication.
204Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 10.7 Accumulation using a treeconstruction.
P0 P1 P2 P3
P0
P0 P2
c0,0
a0,0 b0,0 a0,1 b1,0 a0,2 b2,0 a0,3 b3,0
×
+
×××
+
+
205Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
App
Aqp Aqq
Apq
i j
i
j
Bpp
Bqp Bqq
Bpq Cpp
Cqp Cqq
Cpq
P1 P3P2P0
P0 + P1
P4 + P5 P6 + P7
P2 + P3
P5 P7P6P4
Figure 10.8 Submatrix multiplication and summation.
206Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
B
A
Figure 10.9 Movement of A and Belements.
j
i
Pi,j
207Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
B
A
Figure 10.10 Step 2 — Alignment ofelements of A and B.
j
i
bi+j,j
ai,j+i
i places
j places
208Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
B
A
Figure 10.11 Step 4 — One-place shift ofelements of A and B.
j
i
Pi,j
209Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
c0,0 c0,1 c0,2 c0,3
c1,0 c1,1 c1,2 c1,3
c2,0 c2,1 c2,2 c2,3
c3,0 c3,1 c3,2 c3,3
b3,0b2,0b1,0b0,0
b3,3b2,3b1,3b0,3
b3,2b2,2b1,2b0,2
b3,1b2,1b1,1b0,1
a0,3 a0,2 a0,1 a0,0
a3,3 a3,2 a3,1 a3,0
a2,3 a2,2 a2,1 a2,0
a1,3 a1,2 a1,1 a1,0
Figure 10.12 Matrix multiplication using a systolic array.
Pumpingaction
One cycle delay
210Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
c0
c1
c2
c3
b3b2b1b0
a0,3 a0,2 a0,1 a0,0
a3,3 a3,2 a3,1 a3,0
a2,3 a2,2 a2,1 a2,0
a1,3 a1,2 a1,1 a1,0
Figure 10.13 Matrix-vector multiplicationusing a systolic array.
Pumpingaction
211Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Clearedto zero
Alreadyclearedto zero
Row i
Column i
Column
Row
Figure 10.14 Gaussian elimination.
Row j
Step throughaji
212Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Already
Row i
Column
Row
Figure 10.15 Broadcast in parallel implementation of Gaussian elimination.
Broadcastith row
n − i +1 elements(including b[i])
clearedto zero
213Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Broadcast Figure 10.16 Pipeline implementation ofGaussian elimination.
P0 P1 P2 Pn−1
rows
Row
214Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0
P1
P3
P2
0
n/p
2n/p
3n/p
Figure 10.17 Strip partitioning.
Row
215Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0
P1
Figure 10.18 Cyclic partitioning toequalize workload.
0
n/p
2n/p
3n/p
Row
216Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
∆ ∆
f(x, y)
Solution space
y
x Figure 10.19 Finite difference method.
217Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 10.20 Mesh of points numbered in natural order.
x1 x4x3x2 x8x7x6x5 x9
x31 x34x33x32 x38x37x36x35 x39
x41 x44x43x42 x48x47x46x45 x49
x51 x54x53x52 x58x57x56x55 x59
x61 x64x63x62 x68x67x66x65 x69
x71 x74x73x72 x78x77x76x75 x79
x11 x14x13x12 x18x17x16x15 x19
x21 x24x23x22 x28x27x26x25 x29
x81 x84x83x82 x88x87x86x85 x89
x60
x70
x80
x90
x40
x50
x30
x20
x10
x91 x94x93x92 x98x97x96x95 x99 x100
Boundary points (see text)
218Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
−4−4
−4
−4
−4
1 1 11 1
1 1
1 11
ai,i ai,i+nai,i−1 ai,i+1ai,i−n1 1ith equation
11
11
1
1
Figure 10.21 Sparse matrix for Laplace’s equation.
×
x1
=
To includeboundary values
11
A x
00
00
and some zeroentries (see text)
x2
xN
xN-1
Those equations with a boundarypoint on diagonal unnecessary
for solution
219Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Point
Point to becomputed
computed
Sequential order of computation
Figure 10.22 Gauss-Seidel relaxation with natural order, computed sequentially.
220Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Red
Black
Figure 10.23 Red-black ordering.
221Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 10.24 Nine-point stencil.
222Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Coarsest grid points Finer grid pointsProcessor
Figure 10.25 Multigrid processorallocation.
223Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
50°C
40°C 60°C
Ambient temperature at edges of board = 20°C
Figure 10.26 Printed circuit board for Problem 10-18.
224Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 11.1 Pixmap.
j
i
Origin (0, 0)
p(i, j)Picture element(pixel)
225Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
0 255Gray level
Numberof pixels
Figure 11.2 Image histogram.
226Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
x0
x3 x4 x5
x1 x2
x6 x7 x8 Figure 11.3 Pixel values for a 3 × 3 group.
227Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Step 1Each pixel addspixel from left
Step 2Each pixel addspixel from right
Step 3Each pixel adds pixel
from above
Step 4Each pixel adds pixel
from below
Figure 11.4 Four-step data transfer for the computation of mean.
228Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
x3 + x4
x0 + x1
x6 + x7
(a) Step 1 (b) Step 2
(c) Step 3 (d) Step 4
x0
x3
x2x1
x7x6
x4 x5
x8
x3 + x4 + x5
x0 + x1 + x2
x6 + x7 + x8
x0
x3
x2x1
x7x6
x4 x5
x8
x0 + x1 + x2
x0 + x1 + x2
x6 + x7 + x8
x0
x3
x2x1
x7x6
x4 x5
x8
x3 + x4 + x5
x0 + x1 + x2
x0 + x1 + x2
x6 + x7 + x8
x0
x3
x2x1
x7x6
x4 x5
x8
x3 + x4 + x5x6 + x7 + x8
Figure 11.5 Parallel mean data accumulation.
229Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Largest Next largest
Next largest
in row in row
in column
Figure 11.6 Approximate median algorithm requiring six steps.
230Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
w0 w2
w3 w4
w1
w7w6
w5
w8
x0 x2
x3 x4
x1
x7x6
x5
x8
⊗ =
Figure 11.7 Using a 3 × 3 weighted mask.
x4'
Mask Pixels Result
231Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
1
1 1 1
1 1
1 1 1Figure 11.8 Mask to compute mean.
19
k =
232Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
1
1 8 1
1 1
1 1 1Figure 11.9 A noise reduction mask.
116
k =
233Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
−1
−1 8 −1
−1 −1
−1 −1 −1 Figure 11.10 High-pass sharpening filtermask.
19
k =
234Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Intensity transition
First derivative
Figure 11.11 Edge detection usingdifferentiation.
Second derivative
235Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
f(x, y)
x
y
Image
Figure 11.12 Gray level gradient anddirection.
Gradient
φ
Constantintensity
236Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
0
1
0
1
−1
0
1
−1−1
−1
1
0
0
1
1
−1
0−1
Figure 11.13 Prewitt operator.
237Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
0
1
0
2
−1
0
1
−2−1
−2
1
0
0
1
2
−1
0−1
Figure 11.14 Sobel operator.
238Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 11.15 Edge detection with Sobel operator.
(a) Original image (Annabel) (b) Effect of Sobel operator
239Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
−1
0
4
−1
0
−1
0
−10
Figure 11.16 Laplace operator.
240Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
x1
x5
x7
x4x3
Left pixel Right pixel
Upper pixel
Lower pixelFigure 11.17 Pixels used in Laplaceoperator.
241Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 11.18 Effect of Laplace operator.
242Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
y = ax + b b = −x1a + y1y
x
b
a
Figure 11.19 Mapping a line into (a, b) space.
(b) Parameter space(a) (x, y) plane
(a, b)
Pixel in image
b = −xa + y(x1, y1)
243Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
r = x cos θ + y sin θ
r
θ
Figure 11.20 Mapping a line into (r, θ) space.
rθ
(b) (r, θ) plane(a) (x, y) plane
y = ax + by
x
(r, θ)
244Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
θr
y
x
Figure 11.21 Normal representation usingimage coordinate system.
245Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
r
θ0°
5
10°0
1015
20°30°
Accumulator
Figure 11.22 Accumulators, acc[r][θ], forthe Hough transform.
246Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
xjk
j
k Transformrows
Xjm
Transformcolumns
Xlm
Figure 11.23 Two-dimensional DFT.
247Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
fj,k
gj,k
hj,k
Image
Filter/image
F(j, k)
G(j, k)
H(j, k)
f(j, k)
g(j, k)
h(j, k)
MultiplyConvolution
Transform
Inversetransform
×
(a) Direct convolution (b) Using Fourier transform
Figure 11.24 Convolution using Fourier transforms.
∗
248Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Slave processes
Master process
w0 w1 wn−1
X[0] X[1] X[n−1]Figure 11.25 Master-slave approach forimplementing the DFT directly.
249Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
+××a
wk
x[j]
Figure 11.26 One stage of a pipelineimplementation of DFT algorithm.
X[k]
Process j
X[k]
a
Values fornext iteration
a × x[j]
wk
250Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Time
Figure 11.27 Discrete Fourier transform with a pipeline.
P0 P1 P2 P3 PN−1
(a) Pipeline structure
(b) Timing diagram
Output sequence
X[0] X[1] X[2] X[3] X[4] X[6]X[5]
Pipelinestages
X[0],X[1],X[2],X[3]…X[k]
a
wk
0
1
wk
x[0] x[1] x[2] x[3] x[N−1]
P0
P1
P2
PN−1
PN−2
251Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
x0x1x2
xN−1
x3
xN−2
N/2 ptDFT
N/2 ptDFT
Xk
Xk+N/2
k = 0, 1, … N/2
+
−Xodd × wk
Xeven
Figure 11.28 Decomposition of N-point DFT into two N/2-point DFTs.
Input sequence Transform
252Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
X0
X1
X2
X3
x0
x1
x2
x3
+−
−
+++
+
+Figure 11.29 Four-point discrete Fouriertransform.
253Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Xk = Σ(0,2,4,6,8,10,12,14)+wkΣ(1,3,5,7,9,11,13,15)
{[Σ(0,8)+wkΣ(4,12)]+wk[Σ(2,10)+wkΣ(6,14)]}+{[Σ(1,9)+wkΣ(5,13)]+wk[Σ(3,11)+wkΣ(7,15)]}
{Σ(0,4,8,12)+wkΣ(2,6,10,14)}+wk{Σ(1,5,9,13)+wkΣ(3,7,11,15)}
x0 x8 x4 x12 x2 x10 x6 x14 x1 x9 x5 x13 x3 x11 x7 x15
0000 1000 0100 1100 0010 1010 0110 1011 0001 1001 0101 1101 0011 1011 0111 1111
Figure 11.30 Sixteen-point DFT decomposition.
254Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
x0
x8
x4
x12
x2
x10
x6
x14
x1
x9
x5
x13
x3
x11
x7
x15
X0
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
X13
X14
X15
Figure 11.31 Sixteen-point FFT computational flow.
255Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
x0
x8
x4
x12
x2
x10
x6
x14
x1
x9
x5
x13
x3
x11
x7
x15
X0
X1
X2
X3
X4
X5
X6
X7
X8
X9
X10
X11
X12
X13
X14
X15
Figure 11.32 Mapping processors onto 16-point FFT computation.
P0
P2
P1
P3
0000
0001
0010
0011
0100
0101
0110
0111
1001
1010
1011
1100
1101
1110
1111
1000
P/r
ProcessRow
Inputs Outputs
256Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0 P1 P2 P3
Figure 11.33 FFT using transposealgorithm — first two steps.
x0
x4
x8
x12
x1
x5
x9
x13
x2
x6
x10
x14
x3
x7
x11
x15
257Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
P0 P1 P2 P3
Figure 11.34 Transposing array fortranspose algorithm.
x0
x4
x8
x12
x1
x5
x9
x13
x2
x6
x10
x14
x3
x7
x11
x15
258Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Figure 11.35 FFT using transposealgorithm — last two steps.
P0 P1 P2 P3
x0
x1
x2
x3
x4
x5
x6
x7
x8
x9
x10
x11
x12
x13
x14
x15
259Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
1 2 3 4 5 6 7
1
2
3
4
5
6
7
Mask
Figure 11.36 Image for Problem 11-3.
260Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
First choice
Second choice
Third choice
Figure 12.1 State space tree.
C0 C1 Cn−1
Notincluding
C0
Notincluding
C1
Notincluding
Cn−1
261Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
A1 A2
B1 B2
A1 B2
B1 A2
Parent A
Parent B
Child 1
Child 2Figure 12.2 Single-point crossover.
1
1
1
1
p
p
p
p
p+1
p+1
p+1
p+1
m
m
m
m
262Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Subpopulation
Migration path;
Figure 12.3 Island model.
every island sendsto every other island
263Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Island subpopulations
Limited migration path Figure 12.4 Stepping stone model
264Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Instructions
Figure D.1 PRAM model.
Shared memory
Program
Data
ProcessorsClock
with localmemory
265Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
d[1] s[1] d[2] s[2] d[3] s[3] d[4] s[4] d[5] s[5] d[6] s[6]d[0] s[0] d[7] s[7]
1 111 0111
Figure D.2 List ranking by pointer jumping.
Null
2 222 0122
4 444 0123
7 456 0123
266Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
Local computation
Communication
Barrier synchronization
Threads or processes
Maximum of hsends or receives
(maximum time w)
Figure D.3 A view of the bulk synchronous parallel model.
267Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998
o
L
g
Pi
Pk
PiTime
MessageProcessors
Figure D.4 LogP parameters.o
Next message
268Parallel Programming: Techniques and Applications using Networked Workstations and Parallel Computers
Barry Wilkinson and Michael Allen Prentice Hall, 1998