Parallel Programming and Algorithms – MPI Collective Operations David Monismith CS599 Feb. 10, 2015 Based upon MPI: A Message-Passing Interface Standard

Parallel Programming and Algorithms – MPI Collective Operations

David MonismithCS599

Feb. 10, 2015Based upon MPI: A Message-Passing Interface Standard 3.0 by the

Message Passing Interface Forum, 2012.Available at http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf

Last Time

• Introduced MPI and Batch Scripts• Discussed Blocking Send/Receive• Discussed Collective Message Patterns in

Theory

This Time

• Time for Worksheet 6• Implementation of Collective Message

Patterns.

Collective Message Patterns

• Collective message patterns include patterns such as broadcast, reduce, scatter, and gather.

• When using a collective function call in MPI, the call must be made in every process in the communicator.

• For now, we are using MPI_COMM_WORLD as our communicator.

• Therefore, we will make collective function calls in every process

Broadcast

• Broadcast is a one-to-all function.• Recall that in a broadcast, the same data is

sent to every process.• MPI’s Broadcast operation is implemented via

the use of the MPI_Bcast function.• Broadcast must be called in every process.

MPI_Bcast

• Function signature:MPI_Bcast(buffer, count, dataType, source, comm)• buffer – address of buffer (data to send/recv)• count – number of items in the buffer• dataType – the MPI type of the data in the buffer• source – the rank of the process sending the broadcast• comm – the communication (often

MPI_COMM_WORLD)

Example

• See bcast.c from the course website• This example sends a message from process

rank 0 to all other processes.

Gather

• Gather allows for one process to receive individualized messages from all other processes.

• That is, gather is an all-to-one function.• MPI’s Gather Operation is implemented via the

MPI_Gather function.• This function allows one process to collect

equally sized individualized messages from all other processes.

MPI_Gather• Function signature:MPI_Gather(sendBuffer, sendCount, sendType, recvBuffer, recvCount, recvType, rootProcessId, comm)• sendBuffer – the data to be sent to the root process• sendCount – the number of data items to be sent• sendType – the type of the data items to be sent• recvBuffer – the buffer in which the data will be stored in the root process

(this may be NULL in non-root processes)• recvCount – the number of elements for any single receive (must be non-

negative and only significant at root)• recvType – the data type of elements that will be received• rootProcessId – the rank of the receiving process• comm – the communicator (e.g. MPI_COMM_WORLD)

Example

• See gather.c from the examples page on the course website.

• This example computes the square of N numbers from 0 to N-1 and prints every 100th square.

• Squares are divided among M processes such that each process computes M/N squares.

Scatter

• The scatter operation sends individualized messages from a source process to all other processes.

• Note that the source process itself receives one of these messages.

• Scatter is implemented in MPI using the MPI_Scatter function.

MPI_Scatter• The function signature for scatter follows:MPI_Scatter(sendBuffer, sendCount, sendType, recvBuffer, recvCount, recvType, source, comm)• sendBuffer – starting address of the data to be sent (the entire array)• sendCount – the amount of data to send to each process from the

source• sendType – the type of the data being sent• recvBuffer – starting address of the location where data will be received• recvCount – the amount of data to receive• recvType – the type of data to receive• source – the identifier of the process sending the data• comm – the communicator (e.g. MPI_COMM_WORLD)

Example

• See scatter.c on the course examples webpage.

• This program provides an example of using the scatter function to send equally sized chunks of an array of random integers to all processes.

• Each process computes and prints the average of its chunk.

Reduce

• A reduce operation allows us to take many individual values and combine them using an associative operator.

• Such operators include +, *, min, max, and various logical operators like && and ||.

• We have seen such operations before in shared memory programming via OpenMP.

• MPI implements reduce using the MPI_Reduce function and allows for many scalar reductions to be performed in one function call.

MPI_Reduce• The function signature for reduce follows:MPI_Reduce(sendBuffer, recvBuffer, count, dataType, operation, root, comm)• sendBuffer – starting address of the data to be sent (the entire array)• recvBuffer – starting address of the location where data will be

received• count – the number of reduce results (i.e. the size of the sendBuffer)• dataType – the type of data in the send and receive buffers• operation – the reduce operation to perform• root – the process id that the reduce results will be sent to• comm – the communicator (e.g. MPI_COMM_WORLD)

MPI_Reduce Operations

• MPI_SUM – sum• MPI_MIN – minimum• MPI_MAX – maximum• MPI_PROD – product• MPI_LAND – logical and• MPI_LOR – logical or• MPI_BAND – bitwise and• MPI_BOR – bitwise or• MPI_LXOR – logical xor• MPI_BXOR – bitwise xor

Example

• See reduce.c on the examples page of the course website.

• This example performs a scalar sum reduction on 1,000,000,000 integers across multiple processes.

• Also see reduce3D.c, which performs a 3D sum reduction across 1,000,000,000 3D points.

• Notice that the reduce operation includes the root process as one of the processes that perform computations.

Next time

• MPI_Barrier – barrier operation in MPI• Other MPI All-to-all communications

Documents

Parallel Programming and Algorithms – MPI Collective Operations David Monismith CS599 Feb. 10, 2015 Based upon MPI: A Message-Passing Interface Standard