52
12.1 Parallel Programming

12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

Embed Size (px)

Citation preview

Page 1: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.1

Parallel Programming

Page 2: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.2

Types of Parallel Computers

Two principal types:

1. Single computer containing multiple processors - main memory is shared, hence called “Shared memory multiprocessor”

2. Interconnected multiple computer systems

Page 3: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.3

Conventional ComputerConsists of a processor executing a program stored in a (main) memory:

Each main memory location located by its address. Addresses start at 0 and extend to 2b - 1 when there are b bits (binary digits) in address.

Main memory

Processor

Instructions (to processor)Data (to or from processor)

Page 4: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.4

Shared Memory Multiprocessor• Extend single processor model - multiple

processors connected to a single shared memory with a single address space:

Memory

Processors

A real system will have cache memory associated with each processor

Page 5: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.5

Examples

• Dual Pentiums

• Quad Pentiums

Page 6: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.6

Quad Pentium Shared Memory MultiprocessorProcessor

L2 Cache

Bus interface

L1 cache

Processor

L2 Cache

Bus interface

L1 cache

Processor

L2 Cache

Bus interface

L1 cache

Processor

L2 Cache

Bus interface

L1 cache

Memory controller

Memory

I/O interface

I/O bus

Processor/memorybus

Shared memory

Page 7: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.7

Programming Shared Memory Multiprocessors

• Threads - programmer decomposes program into parallel sequences (threads), each being able to access variables declared outside threads.

Example: Pthreads

• Use sequential programming language with preprocessor compiler directives, constructs, or syntax to declare shared variables and specify parallelism. Examples: OpenMP (an industry standard), UPC

(Unified Parallel C) -- needs compilers.

Page 8: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.8

• Parallel programming language with syntax to express parallelism. Compiler creates executable code -- not now common.

• Use parallelizing compiler to convert regular sequential language programs into parallel executable code - also not now common.

Page 9: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.9

Message-Passing Multicomputer

Complete computers connected through an interconnection network:

Processor

Interconnectionnetwork

Local

Computers

Messages

memory

Page 10: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.10

Dedicated cluster with a master node

UserExternal network

Master node

Compute nodes

Switch

2nd Ethernet interface

Ethernet interface

Cluster

Page 11: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.11

Programming Clusters

• Usually based upon explicit message-passing.

• Common approach -- a set of user-level libraries for message passing. Example:– Parallel Virtual Machine (PVM) - late 1980’s.

Became very popular in mid 1990’s. – Message-Passing Interface (MPI) - standard

defined in 1990’s and now dominant.

Page 12: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.12

MPI(Message Passing Interface)

• Message passing library standard developed by group of academics and industrial partners to foster more widespread use and portability.

• Defines routines, not implementation.

• Several free implementations exist.

Page 13: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.13

MPI designed:

• To address some problems with earlier message-passing system such as PVM.

• To provide powerful message-passing mechanism and routines - over 126 routines(although it is said that one can write reasonable MPI

programs with just 6 MPI routines).

Page 14: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.14

Message-Passing Programming using User-level Message Passing Libraries

Two primary mechanisms needed:

1. A method of creating separate processes for execution on different computers

2. A method of sending and receiving messages

Page 15: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.15

Multiple program, multiple data (MPMD) model

Sourcefile

Executable

Processor 0 Processor p - 1

Sourcefile

Page 16: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.16

Single Program Multiple Data (SPMD) model.

Basic MPI way

Sourcefile

Executables

Processor 0 Processor p - 1

Different processes merged into one program. Control statements select different parts for each processor to execute.

Page 17: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.17

Multiple Program Multiple Data (MPMD) Model

Process 1

Process 2spawn();

Time

Start executionof process 2

Separate programs for each processor. One processor may execute as master process. Other processes started from within master process - dynamic process creation.

Can be done with MPI version 2

Page 18: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.18

Communicators• Defines scope of a communication operation.

• Processes have ranks associated with communicator.

• Initially, all processes enrolled in a “universe” called MPI_COMM_WORLD, and each process is given a unique rank, a number from 0 to p - 1, with p processes.

• Other communicators can be established for groups of processes.

Page 19: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.19

Using SPMD Computational Model

main (int argc, char *argv[]) {MPI_Init(&argc, &argv);

.

.MPI_Comm_rank(MPI_COMM_WORLD,&myrank); /*find rank */

if (myrank == 0)master();

elseslave();..

MPI_Finalize();}

where master() and slave() are to be executed by master process and slave process, respectively.

Page 20: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.20

Basic “point-to-point”Send and Receive Routines

Process 1 Process 2

send(&x, 2);

recv(&y, 1);

x y

Movementof data

Generic syntax (actual formats later)

Passing a message between processes using send() and recv() library calls:

Page 21: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.21

Message Tag

• Used to differentiate between different types of messages being sent.

• Message tag is carried within message.

• If special type matching is not required, a wild card message tag is used, so that the recv() will match with any send().

Page 22: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.22

Message Tag Example

Process 1 Process 2

send(&x,2, 5);

recv(&y,1, 5);

x y

Movementof data

Waits for a message from process 1 with a tag of 5

To send a message, x, with message tag 5 from a source process, 1, to a destination process, 2, and assign to y:

Page 23: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.23

Synchronous Message Passing

Routines return when message transfer completed.

Synchronous send routine• Waits until complete message can be accepted by

the receiving process before sending the message.

Synchronous receive routine• Waits until the message it is expecting arrives.

Page 24: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.24

Synchronous send() and recv() using 3-way protocol

Process 1 Process 2

send();

recv();Suspend

Time

processAcknowledgment

MessageBoth processescontinue

(a) When send() occurs before recv()

Request to send

Page 25: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.25

Synchronous send() and recv() using 3-way protocol

Process 1 Process 2

recv();

send();Suspend

Time

process

Acknowledgment

MessageBoth processescontinue

(b) When recv() occurs before send()

Request to send

Page 26: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.26

• Synchronous routines intrinsically perform two actions:– They transfer data and – They synchronize processes.

Page 27: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.27

Asynchronous Message Passing

• Routines that do not wait for actions to complete before returning. Usually require local storage for messages.

• In general, they do not synchronize processes but allow processes to move forward sooner. Must be used with care.

Page 28: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.28

MPI Blocking and Non-Blocking• Blocking - return after their local actions

complete, though the message transfer may not have been completed.– For example, returns before message written to OS

buffer.

• Non-blocking - return immediately. Assumes that data storage used for transfer not modified by subsequent statements prior to being used for transfer, and it is left to the programmer to ensure this.

Page 29: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.29

How message-passing routines return before message transfer completed

Process 1 Process 2

send();

recv();

Message buffer

Readmessage buffer

Continueprocess

Time

Message buffer needed between source and destination to hold message:

Page 30: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.30

Asynchronous routines changing to synchronous routines

• Buffers only of finite length and a point could be reached when send routine held up because all available buffer space exhausted.

• Then, send routine will wait until storage becomes re-available - i.e then routine behaves as a synchronous routine.

Page 31: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.31

Parameters of MPI blocking send

MPI_Send(buf, count, datatype, dest, tag, comm)

Address of send buffer

Number of items to send

Datatype of each item

Rank of destination process

Message tag

Communicator

Page 32: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.32

Parameters of MPI blocking receive

MPI_Recv(buf,count,datatype,dest,tag,comm,status)

Address of receive buffer Max. number of

items to receive

Datatype of each item

Rank of source process

Message tag

Communicator

Status after operation

Page 33: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.33

Example

To send an integer x from process 0 to process 1,

MPI_Comm_rank(MPI_COMM_WORLD,&myrank); /* find rank */

if (myrank == 0) {int x;MPI_Send(&x,1,MPI_INT,1,msgtag,MPI_COMM_WORLD);

} else if (myrank == 1) {int x;MPI_Recv(&x,1,MPI_INT,0,msgtag,MPI_COMM_WORLD,status);

}

Page 34: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.34

MPI Nonblocking Routines

• Nonblocking send - MPI_Isend() - will return “immediately” even before source location is safe to be altered.

• Nonblocking receive - MPI_Irecv() - will return even if no message to accept.

Page 35: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.35

Detecting when message receive if sent with non-blocking send routine

Completion detected by MPI_Wait() and MPI_Test().

MPI_Wait() waits until operation completed and returns then.

MPI_Test() returns with flag set indicating whether operation completed at that time.

Need to know which particular send you are waiting for.

Identified with request parameter.

Page 36: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.36

Example

To send an integer x from process 0 to process 1 and allow process 0 to continue,

MPI_Comm_rank(MPI_COMM_WORLD, &myrank);/* find rank */if (myrank == 0) {

int x;MPI_Isend(&x,1,MPI_INT,1,msgtag,MPI_COMM_WORLD, req1);compute();MPI_Wait(req1, status);

} else if (myrank == 1) {int x;MPI_Recv(&x,1,MPI_INT,0,msgtag, MPI_COMM_WORLD, status);

}

Page 37: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.37

Collective message passing routines

Have routines that send message(s) to a group of processes or receive message(s) from a group of processes

Higher efficiency than separate point-to-point routines although not absolutely necessary.

Page 38: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.38

BroadcastSending same message to a group of processes.(Sometimes “Multicast” - sending same message to defined group of processes, “Broadcast” - to all processes.)

MPI_bcast();

buf

MPI_bcast();

data

MPI_bcast();

datadata

Process 0 Process p - 1Process 1

Action

Code

MPI form

Page 39: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.39

MPI Broadcast routine

int MPI_Bcast(void *buf, int count, MPI_Datatype datatype, int root, MPI_Comm comm)

Actions:Broadcasts message from root process to all processes in comm and itself.

Parameters:*buf message buffercount number of entries in bufferdatatype data type of bufferroot rank of root

Page 40: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.40

Scatter

MPI_scatter();

buf

MPI_scatter();

data

MPI_scatter();

datadata

Process 0 Process p - 1Process 1

Action

Code

MPI form

Sending each element of an array in root process to a separate process. Contents of ith location of array sent to ith process.

Page 41: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.41

Gather

MPI_gather();

buf

MPI_gather();

data

MPI_gather();

datadata

Process 0 Process p - 1Process 1

Action

Code

MPI form

Having one process collect individual values from set of processes.

Page 42: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.42

Reduce

MPI_reduce();

buf

MPI_reduce();

data

MPI_reduce();

datadata

Process 0 Process p - 1Process 1

+

Action

Code

MPI form

Gather operation combined with specified arithmetic/logical operation.

Example: Values could be gathered and then added together by root:

Page 43: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.43

Collective Communication

Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations:

• MPI_Bcast() - Broadcast from root to all other processes• MPI_Gather() - Gather values for group of processes• MPI_Scatter() - Scatters buffer in parts to group of processes• MPI_Alltoall() - Sends data from all processes to all

processes• MPI_Reduce() - Combine values on all processes to single

value• MPI_Reduce_scatter() - Combine values and scatter results• MPI_Scan() - Compute prefix reductions of data on processes

Page 44: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.44

ExampleTo gather items from group of processes into process 0, using dynamically allocated memory in root process:

int data[10];/*data to be gathered from processes*/

MPI_Comm_rank(MPI_COMM_WORLD, &myrank);/* find rank */if (myrank == 0) {MPI_Comm_size(MPI_COMM_WORLD,&grp_size);/*find size*/

/*allocate memory*/buf = (int *)malloc(grp_size*10*sizeof (int));

}MPI_Gather(data,10,MPI_INT,buf,grp_size*10,MPI_INT,0, MPI_COMM_WORLD);

MPI_Gather() gathers from all processes, including root.

Page 45: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.45

Sample MPI program

#include “mpi.h”#include <stdio.h>#include <math.h>#define MAXSIZE 1000

void main(int argc, char *argv) {int myid, numprocs;int data[MAXSIZE], i, x, low, high, myresult, result;char fn[255];char *fp;MPI_Init(&argc,&argv);//requiredMPI_Comm_size(MPI_COMM_WORLD,&numprocs);MPI_Comm_rank(MPI_COMM_WORLD,&myid);

Page 46: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.46

Sample MPI program

if (myid == 0) { /* Open input file and initialize data */

strcpy(fn,getenv(“HOME”));strcat(fn,”/MPI/rand_data.txt”);if ((fp = fopen(fn,”r”)) == NULL) {

printf(“Can’t open the input file: %s\n\n”, fn);

exit(1);}for(i = 0; i < MAXSIZE; i++)

fscanf(fp,”%d”, &data[i]);} //All processes execute the rest of code.

MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD); /* broadcast data */

Page 47: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.47

Sample MPI program

x = n/nproc; /* Add my portion Of data */

low = myid * x;

high = low + x;

for(i = low; i < high; i++)myresult += data[i];

printf(“I calculated %d from %d\n”, myresult, myid); /* Compute global sum */

MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);

if (myid == 0) printf(“The sum is %d.\n”, result);MPI_Finalize();

}

Page 48: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.48

Message-Passing on a Grid

• VERY expensive, sending data across network costs millions of cycles

• Bandwidth shared with other users

• Links unreliable

Page 49: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.49

Computational Strategies

• As a computing platform, a grid favors situations with absolute minimum communication between computers.

Page 50: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.50

StrategiesWith no/minimum communication:

• “Embarrassingly Parallel” Computations– those computations which obviously can be

divided into parallel independent parts. Parts executed on separate computers.

• Separate instance of the same problem executing on each system, each using different data

Page 51: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.51

Embarrassingly Parallel Computations

A computation that can obviously be divided into a number of completely independent parts, each of which can be executed by a separate process(or).

Processes

Results

Input data

No communication or very little communication between processes.Each process can do its tasks without any interaction with other processes

Page 52: 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared,

12.52

Monte Carlo Methods

• An embarrassingly parallel computation.

• Monte Carlo methods use of random selections.