45
Collective Communication

Collective Communication

  • Upload
    trilby

  • View
    49

  • Download
    1

Embed Size (px)

DESCRIPTION

Collective Communication. Collective Communication. Collective communication is defined as communication that involves a group of processes More restrictive than point to point Data sent is same as the data received, i.e. type, amount - PowerPoint PPT Presentation

Citation preview

Page 1: Collective Communication

Collective Communication

Page 2: Collective Communication

Collective Communication

Collective communication is defined as communication that involves a group of processes

More restrictive than point to point Data sent is same as the data received, i.e.

type, amount All processes involved make one call, no tag to

match operation Processes involved can return only when

operation completes blocking communication only

Standard Mode only

Page 3: Collective Communication

Collective Functions

Barrier synchronization across all group members Broadcast from one member to all members of a group Gather data from all group members to one member Scatter data from one member to all members of a group A variation on Gather where all members of the group receive

the result. (allgather) Scatter/Gather data from all members to all members of a

group (also called complete exchange or all-to-all) (alltoall) Global reduction operations such as sum, max, min, or user-

defined functions, where the result is returned to all group members and a variation where the result is returned to only one member

A combined reduction and scatter operation Scan across all members of a group (also called prefix)

Page 4: Collective Communication

Collective Functions

Page 5: Collective Communication

Collective Functions

Page 6: Collective Communication

Collective Functions – MPI_BARRIER

blocks the caller until all group members have called it

returns at any process only after all group members have entered the call

C int MPI_Barrier(MPI_Comm comm ) Input Parameter

comm: communicator (handle) Fortran

MPI_BARRIER(COMM, IERROR) INTEGER COMM, IERROR

Page 7: Collective Communication

Collective Functions – MPI_BCAST

broadcasts a message from the process with rank root to all processes of the group, itself included

C int MPI_Bcast(void* buffer, int count, MPI_Datatype datatype, int root,

MPI_Comm comm ) Input Parameters

count: number of entries in buffer (integer) datatype: data type of buffer (handle) root: rank of broadcast root (integer) comm: communicator (handle)

Input / Output Parameter buffer: starting address of buffer (choice)

Fortran MPI_BCAST(BUFFER, COUNT, DATATYPE, ROOT, COMM, IERROR) <type> BUFFER(*) INTEGER COUNT, DATATYPE, ROOT, COMM, IERROR

Page 8: Collective Communication

Collective Functions – MPI_BCAST

A

AAA A

Page 9: Collective Communication

Collective Functions – MPI_GATHER

Each process (root process included) sends the contents of its send buffer to the root process.

The root process receives the messages and stores them in rank order

C int MPI_Gather(void* sendbuf, int sendcount, MPI_Datatype sendtype, v

oid* recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm)

Input Parameters sendbuf: starting address of send buffer (choice) sendcount: number of elements in send buffer (integer) sendtype: data type of send buffer elements (handle) recvcount: number of elements for any single receive (integer, significant only

at root) recvtype: data type of recv buffer elements (significant only at root) (handle) root: rank of receiving process (integer) comm: communicator (handle)

Page 10: Collective Communication

Collective Functions – MPI_GATHER

Output Parameter recvbuf: address of receive buffer (choice, significa

nt only at root) Fortran

MPI_GATHER(SENDBUF, SENDCOUNT, SENDTYPE, RECVBUF, RECVCOUNT, RECVTYPE, ROOT, COMM, IERROR)

<type> SENDBUF(*), RECVBUF(*) INTEGER SENDCOUNT, SENDTYPE, RECVCOUNT, RE

CVTYPE, ROOT, COMM, IERROR

Page 11: Collective Communication

Collective Functions – MPI_GATHER

BA C D

DCA

A B C D

B

Page 12: Collective Communication

Collective Functions – MPI_SCATTER

MPI_SCATTER is the inverse operation to MPI_GATHER C

int MPI_Scatter(void* sendbuf, int sendcount, MPI_Datatype sendtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, int root, MPI_Comm comm)

Input Parameters sendbuf: address of send buffer (choice, significant only at root) sendcount: number of elements sent to each process (integer, signi

ficant only at root) sendtype: data type of send buffer elements (significant only at roo

t) (handle) recvcount: number of elements in receive buffer (integer) recvtype: data type of receive buffer elements (handle) root: rank of sending process (integer) comm: communicator (handle)

Page 13: Collective Communication

Collective Functions – MPI_SCATTER

Output Parameter recvbuf: address of receive buffer (choice)

Fortran MPI_SCATTER(SENDBUF, SENDCOUNT, SE

NDTYPE, RECVBUF, RECVCOUNT, RECVTYPE, ROOT, COMM, IERROR)

<type> SENDBUF(*), RECVBUF(*) INTEGER SENDCOUNT, SENDTYPE, RECVCO

UNT, RECVTYPE, ROOT, COMM, IERROR

Page 14: Collective Communication

Collective Functions – MPI_SCATTER

A B C D

DCA

A B C D

B

Page 15: Collective Communication

Collective Functions – MPI_ALLGATHER

MPI_ALLGATHER can be thought of as MPI_GATHER, but where all processes receive the result, instead of just the root.

The jth block of data sent from each process is received by every process and placed in the jth block of the buffer recvbuf.

C int MPI_Allgather(void* sendbuf, int sendcount, MPI_Datatype sendtype,

void* recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm)

Input Parameters sendbuf: starting address of send buffer (choice) sendcount: number of elements in send buffer (integer) sendtype: data type of send buffer elements (handle) recvcount: number of elements received from any process (integer) recvtype: data type of receive buffer elements (handle) comm: communicator (handle)

Page 16: Collective Communication

Collective Functions – MPI_ALLGATHER

Output Parameter recvbuf: address of receive buffer (choice)

Fortran MPI_ALLGATHER(SENDBUF, SENDCOUNT, SENDTY

PE, RECVBUF, RECVCOUNT, RECVTYPE, COMM, IERROR)

<type> SENDBUF(*), RECVBUF(*) INTEGER SENDCOUNT, SENDTYPE, RECVCOUNT, RE

CVTYPE, COMM, IERROR

Page 17: Collective Communication

Collective Functions – MPI_ALLGATHER

BA C D

DCA

A B C D

B

A B C D A B C DA B C D

MPI_ALLGATHER

Page 18: Collective Communication

Collective Functions – MPI_ALLTOALL

Extension of MPI_ALLGATHER to the case where each process sends distinct data to each of the receivers. The jth block sent from process i is received by process j and is placed in the ith block of recvbuf

C int MPI_Alltoall(void* sendbuf, int sendcount, MPI_Datatype se

ndtype, void* recvbuf, int recvcount, MPI_Datatype recvtype, MPI_Comm comm)

Input Parameters sendbuf: starting address of send buffer (choice) sendcount: number of elements sent to each process (integer) sendtype: data type of send buffer elements (handle) recvcount: number of elements received from any process (integer) recvtype: data type of receive buffer elements (handle) comm: communicator (handle)

Page 19: Collective Communication

Collective Functions – MPI_ALLTOALL

Output Parameter recvbuf: address of receive buffer (choice)

Fortran MPI_ALLTOALL(SENDBUF, SENDCOUNT, SE

NDTYPE, RECVBUF, RECVCOUNT, RECVTYPE, COMM, IERROR)

<type> SENDBUF(*), RECVBUF(*) INTEGER SENDCOUNT, SENDTYPE, RECVCO

UNT, RECVTYPE, COMM, IERROR

Page 20: Collective Communication

Collective Functions – MPI_ALLTOALL

E F G HA B C D I J K L M N O P

A B C D

A E I M

E F G H

B F J N

I J K L

C G K O

M N O P

D H L P

Rank 0 Rank 1 Rank 2 Rank 3

MPI_ALLTOALL

Page 21: Collective Communication

Collective Functions – MPI_REDUCE

MPI_REDUCE combines the elements provided in the input buffer (sendbuf) of each process in the group, using the operation op, and returns the combined value in the output buffer (recvbuf) of the process with rank root

C int MPI_Reduce(void* sendbuf, void* recvbuf, int count, MPI_Datatype d

atatype, MPI_Op op, int root, MPI_Comm comm) Input Parameters

sendbuf: address of send buffer (choice) count: number of elements in send buffer (integer) datatype: data type of elements of send buffer (handle) op: reduce operation (handle) root: rank of root process (integer) comm: communicator (handle)

Output Parameter recvbuf: address of receive buffer (choice, significant only at root)

Page 22: Collective Communication

Collective Functions – MPI_REDUCE

Fortran MPI_REDUCE(SENDBUF, RECVBUF, COUNT, DATATYPE, OP, ROOT, COMM, IERROR) <type> SENDBUF(*), RECVBUF(*) INTEGER COUNT, DATATYPE, OP, ROOT, COMM, IERROR

Predefined Reduce Operations [ MPI_MAX] maximum [ MPI_MIN] minimum [ MPI_SUM] sum [ MPI_PROD] product [ MPI_LAND] logical and [ MPI_BAND] bit-wise and [ MPI_LOR] logical or [ MPI_BOR] bit-wise or [ MPI_LXOR] logical xor [ MPI_BXOR] bit-wise xor [ MPI_MAXLOC] max value and location (return the max and an integer, which is the

rank storing the max value) [ MPI_MINLOC] min value and location

Page 23: Collective Communication

Collective Functions – MPI_REDUCE

E F G HA B C D I J K L M N O P

A B C D E F G H I J K L M N O P

AoEoIoM

Rank 0 Rank 1 Rank 2 Rank 3

In this case, root = 1

if count = 2, there will be BoFoJoN in the 2nd element of the array

Page 24: Collective Communication

Collective Functions – MPI_ALLREDUCE

Variants of the reduce operations where the result is returned to all processes in the group

The all-reduce operations can be implemented as a reduce, followed by a broadcast. However, a direct implementation can lead to better performance.

C int MPI_Allreduce(void* sendbuf, void* recvbuf, int

count, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm)

Page 25: Collective Communication

Collective Functions – MPI_ALLREDUCE

Input Parameters sendbuf: starting address of send buffer (choice) count: number of elements in send buffer (integer) datatype: data type of elements of send buffer (handle) op: operation (handle) comm: communicator (handle)

Output Parameter recvbuf: starting address of receive buffer (choice)

Fortran MPI_ALLREDUCE(SENDBUF, RECVBUF, COUNT, DATATYP

E, OP, COMM, IERROR) <type> SENDBUF(*), RECVBUF(*) INTEGER COUNT, DATATYPE, OP, COMM, IERROR

Page 26: Collective Communication

Collective Functions – MPI_ALLREDUCE

E F G HA B C D I J K L M N O P

A B C D E F G H I J K L M N O P

AoEoIoM

Rank 0 Rank 1 Rank 2 Rank 3

Page 27: Collective Communication

Collective Functions – MPI_REDUCE_SCATTER

Variants of each of the reduce operations where the result is scattered to all processes in the group on return.

MPI_REDUCE_SCATTER first does an element-wise reduction on vector of count=∑i recvcounts[i] elements in the send buffer defined by sendbuf, count and datatype.

Next, the resulting vector of results is split into n disjoint segments, where n is the number of members in the group. Segment i contains recvcounts[i] elements.

The ith segment is sent to process i and stored in the receive buffer defined by recvbuf, recvcounts[i] and datatype.

The MPI_REDUCE_SCATTER routine is functionally equivalent to: A MPI_REDUCE operation function with count equal to the sum of recvcounts[i] followed by MPI_SCATTERV with sendcounts equal to recvcounts. However, a direct implementation may run faster.

Page 28: Collective Communication

Collective Functions – MPI_REDUCE_SCATTER

C int MPI_Reduce_scatter(void* sendbuf, void* recvbuf, int *recvcoun

ts, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm) Input Parameters

sendbuf: starting address of send buffer (choice) recvcounts: integer array specifying the number of elements in result distributed to

each process. Array must be identical on all calling processes. datatype: data type of elements of input buffer (handle) op: operation (handle) comm: communicator (handle)

Output Parameter recvbuf: starting address of receive buffer (choice)

Fortran MPI_REDUCE_SCATTER(SENDBUF, RECVBUF, RECVCOUNTS, DATAT

YPE, OP, COMM, IERROR) <type> SENDBUF(*), RECVBUF(*) INTEGER RECVCOUNTS(*), DATATYPE, OP, COMM, IERROR

Page 29: Collective Communication

Collective Functions – MPI_REDUCE_SCATTER

A B C DRank 0recvcounts = 1

E F G H

I J K L

M N O P

Rank 1recvcounts = 2

Rank 2recvcounts = 0

Rank 3recvcounts = 1

AoEoIoM

A B C D

E F G H

I J K L

M N O P

BoFoJoN

CoGoKoODoHoLoP

Page 30: Collective Communication

Collective Functions – MPI_SCAN

MPI_SCAN is used to perform a prefix reduction on data distributed across the group. The operation returns, in the receive buffer of the process with rank i, the reduction of the values in the send buffers of processes with ranks 0,...,i (inclusive). The type of operations supported, their semantics, and the constraints on send and receive buffers are as for MPI_REDUCE.

C int MPI_Scan(void* sendbuf, void* recvbuf, int count, MP

I_Datatype datatype, MPI_Op op, MPI_Comm comm )

Page 31: Collective Communication

Collective Functions – MPI_SCAN

Input Parameters sendbuf: starting address of send buffer (choice) count: number of elements in input buffer (integer) datatype: data type of elements of input buffer (handle) op: operation (handle) comm: communicator (handle)

Output Parameter recvbuf: starting address of receive buffer (choice)

Fortran MPI_SCAN(SENDBUF, RECVBUF, COUNT, DATATYPE, OP,

COMM, IERROR) <type> SENDBUF(*), RECVBUF(*) INTEGER COUNT, DATATYPE, OP, COMM, IERROR

Page 32: Collective Communication

Collective Functions – MPI_SCAN

A B C DRank 0

E F G H

I J K L

M N O P

Rank 1

Rank 2

Rank 3

AoEoIoM

A B C D

E F G H

I J K L

M N O P

AoEoI

AoE

A

Page 33: Collective Communication

Example – MPI_BCAST

To demonstrate how to use MPI_BCAST to distribute an array to other process

Page 34: Collective Communication

Example – MPI_BCAST (C) /* // root broadcast the array to all processes */

#include<stdio.h> #include<mpi.h>

#define SIZE 10

main( int argc, char** argv) { int my_rank; // the rank of each proc int array[SIZE]; int root = 0; // the rank of root int i; MPI_Comm comm = MPI_COMM_WORLD;

MPI_Init(&argc, &argv); MPI_Comm_rank(comm, &my_rank);

if (my_rank == 0) { for (i = 0; i < SIZE; i ++) { array[i] = i; } }

Page 35: Collective Communication

Example – MPI_BCAST (C) else { for (i = 0; i < SIZE; i ++) { array[i] = 0; } }

printf("Proc %d: (Before Broadcast) ", my_rank); for (i = 0; i < SIZE; i ++) { printf("%d ", array[i]); } printf("\n");

MPI_Bcast(array, SIZE, MPI_INT, root, comm);

printf("Proc %d: (After Broadcast) ", my_rank); for (i = 0; i < SIZE; i ++) { printf("%d ", array[i]); } printf("\n");

MPI_Finalize(); return 0; }

Page 36: Collective Communication

Example – MPI_BCAST (Fortran)

C /* C * root broadcast the array to all processes C */

PROGRAM main INCLUDE 'mpif.h'

PARAMETER (SIZE = 10) INTEGER my_rank, ierr, root, i INTEGER array(SIZE) INTEGER comm INTEGER arraysize

root = 0 comm = MPI_COMM_WORLD arraysize = SIZE

Page 37: Collective Communication

Example – MPI_BCAST (Fortran)

CALL MPI_INIT(ierr) CALL MPI_COMM_RANK(comm, my_rank, ierr)

IF (my_rank.EQ.0) THEN DO i = 1, SIZE array(i) = i END DO ELSE DO i = 1, SIZE array(i) = 0 END DO END IF

WRITE(6, *) "Proc ", my_rank, ": (Before Broadcast)", (array(i), i=1, SIZE) CALL MPI_Bcast(array, arraysize, MPI_INTEGER, root, comm, ierr) WRITE(6, *) "Proc ", my_rank, ": (After Broadcast)", (array(i), i=1, SIZE)

call MPI_FINALIZE(ierr) end

Page 38: Collective Communication

Case Study 1 – MPI_SCATTER and MPI_REDUCE

Master distributes (scatters) an array across processes. Processes add their elements, then combine sum in master through a reduction operation.

Step 1 Proc 0 initializes a 16 integers array Proc 0: {1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,

12, 13, 14, 15, 16}

Page 39: Collective Communication

Case Study 1 – MPI_SCATTER and MPI_REDUCE

Step 2 Scatter array among all processes Proc 0: {1, 2, 3, 4} Proc 1: {5, 6, 7, 8} Proc 2: {9, 10, 11, 12} Proc 3: {13, 14, 15, 16}

Step 3 Each process does some calculations

Page 40: Collective Communication

Case Study 1 – MPI_SCATTER and MPI_REDUCE

Step 4 Reduce to Proc 0 Proc 0: Total Sum

C mpi_scatter_reduce01.c Compilation:

mpicc mpi_scatter_reduce01.c –o mpi_scatter_reduce01 Run:

mpirun –np 4 mpi_scatter_reduce01 Fortran

mpi_scatter_reduce01.f Compilation:

mpif77 mpi_scatter_reduce01.f –o mpi_scatter_reduce01 Run:

mpirun –np 4 mpi_scatter_reduce01

Page 41: Collective Communication

Case Study 2 – MPI_GATHERMatrix Multiplication

760

686

612

538

20161912188174

20151911187173

20141910186172

2013199185171

20

19

18

17

161284

151173

141062

13951

Algorithm: {4x4 matrix A} x {4x1 vector x} = product Each process stores a row of A and a single entry of

x Use 4 gather operations to place a full copy of x in

each process, then perform multiplications

Page 42: Collective Communication

Case Study 2 – MPI_GATHERMatrix Multiplication

Step 1: Initialization Proc 0: {1 5 9 13}, {17} Proc 1: {2 6 10 14}, {18} Proc 2: {3 7 11 15}, {19} Proc 3: {4 8 12 16}, {20}

Step 2: Perform 4 times MPI_GATHER to gather the column matr

ix to each process Proc0: {1 5 9 13}, {17 18 19 20} Proc1: {2 6 10 14}, {17 18 19 20} Proc2: {3 7 11 15}, {17 18 19 20} Proc3: {4 8 12 16}, {17 18 19 20}

Page 43: Collective Communication

Case Study 2 – MPI_GATHERMatrix Multiplication

Step 3: Perform multiplication Proc 0: 1x17+5x18+9x19+13x20=538 Proc 1: 2x17+6x18+10x19+14x20=612 Proc 2: 3x17+7x18+11x19+15x20=686 Proc 3: 4x17+8x18+12x19+16x20=760

Step 4: Gather all process’ inner product into master

process and display the result

Page 44: Collective Communication

Case Study 2 – MPI_GATHERMatrix Multiplication

C mpi_gather01.c Compilation:

mpicc mpi_gather01.c –o mpi_gather01 Run:

mpirun –np 4 mpi_gather01 Fortran

mpi_gather01.f Compilation:

mpif77 mpi_gather01.f –o mpi_gather01 Run:

mpirun –np 4 mpi_gather01

Page 45: Collective Communication

END