Introduction to MPI - IIT Kanpur · Introduction to MPI Preeti Malakar ... [Source: Cray...

Introduction to MPI

Preeti Malakar

pmalakar@cse.iitk.ac.in

An Introductory Course on High-Performance Computing in Science and Engineering

25th February 2019

Parallel Programming Models

Libraries MPI, TBB, Pthread, OpenMP, …

New languages Haskell, X10, Chapel, …

Extensions Coarray Fortran, UPC, Cilk, OpenCL, …

• Shared memory

– OpenMP, Pthreads, …

• Distributed memory

– MPI, UPC, …

• Hybrid

– MPI + OpenMP

Hardware and Network Model

Memory

• Interconnected systems• Distributed memory• NO centralized server/master

host1 host2 host3 host4

Persistent storage

Message Passing Interface (MPI)

• Standard for message passing in a distributed memory environment

• Efforts began in 1991 by Jack Dongarra, Tony Hey, and David W. Walker

• MPI Forum

– Version 1.0: 1994

– Version 2.0: 1997

– Version 3.0: 2012

MPI Implementations

• MPICH (ANL)

• MVAPICH (OSU)

• Intel MPI

• OpenMPI

MPI Processes

process process process process

Memory

host1 host2 host3 host4

Persistent storage

MPI Internals

• Start and stop processes in a scalable way

• Setup communication channels for parallel processes

• Provide system-specific information to processes

Process Manager

• Schedule MPI jobs on a cluster/supercomputer

• Allocate required number of nodes– Two jobs generally do not run on the same core

• Enforce other policies (queuing etc.)

Job scheduler

Communication Channels

• Sockets for network I/O

• MPI handles communications, progress etc.

Reference: Design and Evaluation of Nemesis, a Scalable, Low-Latency, Message-Passing Communication Subsystem by Buntinas et al. 8

Message Passing Paradigm

• Message sends and receives

• Explicit communication

Communication types

• Blocking

• Non-blocking

Getting Started

Function names: MPI_*

Initialization and Finalization

MPI_Init

• gather information about the parallel job

• set up internal library state

• prepare for communication

MPI_Finalize

• cleanup

MPI_COMM_WORLD

Communication Scope

Communicator (communication handle)

• Defines the scope

• Specifies communication context

Process • Belongs to a group• Identified by a rank within a group

Identification• MPI_Comm_size – total number of processes in communicator• MPI_Comm_rank – rank in the communicator

MPI_COMM_WORLD

Process identified by rank/id

Getting Started

Rank of a process

Total number of processes

MPI Message

• Data and header/envelope

• Typically, MPI communications send/receive messages

Source: Origin of message

Destination: Receiver of message

Communicator

Tag (0:MPI_TAG_UB)

Message Envelope

MPI Communication Types

Point-to-point Collective

Point-to-point Communication

• MPI_Send

• MPI_Recv

SENDER

RECEIVER

int MPI_Send (const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)

int MPI_Recv (void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)

Tags should match

Blocking send and receive

MPI_Datatype

• MPI_BYTE

• MPI_CHAR

• MPI_INT

• MPI_FLOAT

• MPI_DOUBLE

Example 1

MPI_Comm_rank (MPI_COMM_WORLD, &myrank);

// Sender processif (myrank == 0) /* code for process 0 */{strcpy (message,"Hello, there");MPI_Send (message, strlen(message)+1, MPI_CHAR, 1, 99,

MPI_COMM_WORLD);}

// Receiver processelse if (myrank == 1) /* code for process 1 */{MPI_Recv (message, 20, MPI_CHAR, 0, 99, MPI_COMM_WORLD, &status);printf ("received :%s\n", message);

Message tag

MPI_Status

• Source rank

• Message tag

• Message length

– MPI_Get_count

MPI_ANY_*

• MPI_ANY_SOURCE

– Receiver may specify wildcard value for source

• MPI_ANY_TAG

– Receiver may specify wildcard value for tag

Example 2

MPI_Comm_rank (MPI_COMM_WORLD, &myrank);

// Sender processif (myrank == 0) /* code for process 0 */{strcpy (message,"Hello, there");MPI_Send (message, strlen(message)+1, MPI_CHAR, 1, 99,

MPI_COMM_WORLD);}

// Receiver processelse if (myrank == 1) /* code for process 1 */{MPI_Recv (message, 20, MPI_CHAR, MPI_ANY_SOURCE, 99,

MPI_COMM_WORLD, &status);printf ("received :%s\n", message);

MPI_Send (Blocking)

• Does not return until buffer can be reused

• Message buffering

• Implementation-dependent

• Standard communication mode

SENDER

RECEIVER

Buffering

[Source: Cray presentation] 25

Safety

MPI_Send

MPI_Recv

MPI_RecvSafe

MPI_Send

MPI_Recv

MPI_Send

MPI_RecvUnsafe

MPI_Send

MPI_Recv

MPI_SendSafe

MPI_Recv

MPI_Send

MPI_Recv

MPI_SendUnsafe

Message Protocols

• Short

– Message sent with envelope/header

• Eager

– Send completes without acknowledgement from destination

– Small messages – typically 128 KB (at least in MPICH)

– MPIR_CVAR_CH3_EAGER_MAX_MSG_SIZE (check mpivars)

• Rendezvous

– Requires an acknowledgement from a matching receive

– Large messages

Other Send Modes

• MPI_Bsend Buffered

– May complete before matching receive is posted

• MPI_Ssend Synchronous

– Completes only if a matching receive is posted

• MPI_Rsend Ready

– Started only if a matching receive is posted

Non-blocking Point-to-Point

• MPI_Isend (buf, count, datatype, dest, tag, comm, request)

• MPI_Irecv (buf, count, datatype, source, tag, comm, request)

• MPI_Wait (request, status)

MPI_Isend

MPI_Recv

MPI_Isend

MPI_RecvSafe

Computation Communication Overlap

MPI_Isend

MPI_Recv

MPI_Wait

compute

Collective Communications

• Must be called by all processes that are part of the communicator

• Synchronization (MPI_Barrier)

• Global communication (MPI_Bcast, MPI_Gather, …)

• Global reduction (MPI_Reduce, …)

Barrier

• Synchronization across all group members

• Collective call

• Blocks until all processes have entered the call

• MPI_Barrier (comm)

Broadcast

• Root process sends message to all processes

• Any process can be root process but has to be the same in all processes

• int MPI_Bcast (buffer, count, datatype, root, comm)

• Number of elements in buffer – count

• buffer – Input or output?

Q: Can you use point-to-point communication

for the same?

Example 3

int rank, size, color;MPI_Status status;

MPI_Init (&argc, &argv);MPI_Comm_rank (MPI_COMM_WORLD, &rank);MPI_Comm_size (MPI_COMM_WORLD, &size);

color = rank + 2;int oldcolor = color;MPI_Bcast (&color, 1, MPI_INT, 0, MPI_COMM_WORLD);

printf ("%d: %d color changed to %d\n", rank, oldcolor, color);

Gather

• Gathers values from all processes to a root process

• int MPI_Gather (sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm)

• Arguments recv* not relevant on non-root processes

A0 A1 A2

Example 4

MPI_Comm_rank (MPI_COMM_WORLD, &rank);MPI_Comm_size (MPI_COMM_WORLD, &size);

color = rank + 2;

int colors[size];MPI_Gather (&color, 1, MPI_INT, colors, 1, MPI_INT, 0,

MPI_COMM_WORLD);

if (rank == 0)for (i=0; i<size; i++)printf ("color from %d = %d\n", i, colors[i]);

Scatter

• Scatters values to all processes from a root process

• int MPI_Scatter (sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm)

• Arguments send* not relevant on non-root processes

• Output parameter – recvbuf

A0 A1 A2

Allgather

• All processes gather values from all processes

• int MPI_Allgather(sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm)

A0 A1 A2

Reduce

• MPI_Reduce (inbuf, outbuf, count, datatype, op, root, comm)

• Combines element in inbuf of each process

• Combined value in outbuf of root

• op: MIN, MAX, SUM, PROD, …

0 1 2 3 4 5 6 2 1 2 3 2 5 2 0 1 1 0 1 1 0

2 1 2 3 4 5 6 MAX at root

Allreduce

• MPI_Allreduce (inbuf, outbuf, count, datatype, op, comm)

• op: MIN, MAX, SUM, PROD, …

• Combines element in inbuf of each process

• Combined value in outbuf of each process

0 1 2 3 4 5 6 2 1 2 3 2 5 2 0 1 1 0 1 1 0

2 1 2 3 4 5 6

2 1 2 3 4 5 62 1 2 3 4 5 6

Sub-communicator

- Logical subset- Different contexts

MPI_COMM_SPLIT

MPI_Comm_split (MPI_Comm oldcomm, intcolor, int key, MPI_Comm *newcomm)

• Collective call

• Logically divides based on color

– Same color processes form a group

– Some processes may not be part of newcomm(MPI_UNDEFINED)

• Rank assignment based on key

Logical subsets of processes

How do you assign one color to odd processes and another color to even processes ?color = rank % 2

0 02 14 2…

1 03 15 2…

MPI ProgrammingHands-on

How to run an MPI program on a cluster?

MPI process MPI process MPI process MPI process

4 nodes, ppn=2

mpiexec –n <number of processes> -f <hostfile> ./exe

host1:2host2:2host3:2…

How to run an MPI program on a managed cluster/supercomputer?

MPI process MPI process MPI process MPI process

4 nodes, ppn=2

Execution on HPC2010: qsub sub.sh

Practice Examples

• egN directory (N=1,2,3,4)

– egN.c

• Compile

– source /opt/software/intel/initpaths intel64

– make

• Execute

– qsub sub.sh

Your Code

• Each sub-directory (a1, a2, a3) has

– sub.sh [Required number of cores mentioned]

– Makefile

– .c

– Edit the .c [Look for “WRITE YOUR CODE HERE”]

Assignments

1. Even processes send their data to odd processes

2. Element-wise sum of distributed arrays

3. Sum of array elements of 2 large arrays– Make two groups of processes {0,2,4,6} and {1,3,5,7}

– The 0th process of each group should distribute its array to the other group members (equal division)

– All processes sum up their individual array chunks

Reference Material

• Marc Snir, Steve W. Otto, Steven Huss-Lederman, David W. Walker and Jack Dongarra, MPI - The Complete Reference, Second Edition, Volume 1, The MPI Core.

• William Gropp, Ewing Lusk, Anthony Skjellum, Using MPI : portable parallel programming with the message-passing interface, 3rd Ed., Cambridge MIT Press, 2014.

Introduction to MPI - IIT Kanpur · Introduction to MPI Preeti Malakar ... [Source: Cray...

Documents

Kirsten Heinrich, Library MPI Biogeochemistry & MPI ......Kirsten Heinrich, Library MPI Biogeochemistry & MPI Chemical Ecology Further library services Open licensing: CC licences

Hybrid MPI and OpenMP Parallel Programming · SP-MZ MPI+OpenMP BT-MZ (MPI) BT-MZ MPI+OpenMP • Scalability in Mflops • MPI/OpenMP outperforms pure MPI • Use of numactl essential

MPI Grenzfläche Karte-RZ · MPI für Kolloid- und Grenzflächenforschung MPI für Gravitationsphysik Zentralgebäude MPI für molekulare Pflanzenphysiologie. Title: MPI_Grenzfläche_Karte-RZ.indd

What is [Open] MPI?open]-mpi-2up.pdf2 May 2008 Screencast: What is [Open] MPI? 3 MPI Forum • Published MPI-1 spec in 1994 • Published MPI-2 spec in 1996 Additions to MPI-1 •

S7-LAN / MPI-LAN / S7-USB / MPI-USB / MPI-II user manual V2

Message Passing Interface (MPI) - Cornell University– MPI-1 was released in 1994, MPI-2 in 1996, and MPI-3 in 2012. • MPI applications can be fairly portable • MPI is a good

Srovnání standardního a doplňkového příslušenství k ......MPI-530 MPI-525 MPI-520 MPI-520S MPI-505 MPI-502 MIC-10k1 MIC-5050 MIC-5010 MIC-5005 MIC-5000 MIC-2510 MIC-2505 MIC-30

Preparing to Program Aurora at Exascale - IWOCL · 14 MPI on Aurora • Intel MPI & Cray MPI • MPI 3.0 standard comoliant • The MPI library will be thread safe • Allow aoolications

Implantes dentales MPI MPI Privilege 6 10

Introduction to MPI: Part II - SHARCNET · MPI_Comm comm, MPI_Status *status ) Combines: MPI_Send - send data to process with rank=dest MPI_Recv - receive data from process with rank=source

EXHIBIT LIST FOR 2020 INSURANCE RATES€¦ · mpi-1: ex. # mpi-2 ex. # mpi-3 ex. # mpi-4 ex. # mpi-5-1 ex. # mpi-5-2 ex. # mpi-6 ex. # mpi-7

The MPI+MPI programming model and why we need shared ... · The MPI+MPI programming model and why we need shared-memory MPI libraries Je Hammond Extreme Scalability Group & Parallel

PACX-MPI LAM/MPI A High Performance Message Passing …icl.cs.utk.edu/graphics/posters/files/OpenMPI.pdfOpen MPI integrates technologies and resources from several other projects (HARNESS/FT-MPI,LA-MPI,LAM/MPI,

MPI Tutorial - Idre · 13.02.2013 · MPI_Recv: receive data from another process MPI_Recv(buf, count, datatype, src, tag, comm, status) 16 Arguments Meanings buf starting address

SONEL MPI-520 / MPI-520 Start

Learning Fair Classiﬁers - arXiv · Muhammad Bilal Zafar MPI-SWS, Germany mzafar@mpi-sws.org Isabel Valera MPI-SWS, Germany ivalera@mpi-sws.org Manuel Gomez Rogriguez MPI-SWS, Germany

FORTRAN and MPI Message Passing Interface (MPI)...FORTRAN and MPI Message Passing Interface (MPI) Day 3 Maya Neytcheva, IT, Uppsala University maya@it.uu.se 1 Course plan: MPI - General

CS 591 x I/O in MPI. MPI exists as many different implementations MPI implementations are based on MPI standards MPI standards are developed and maintained

What is MPI? Where Did MPI Come From?

Introduction to MPI MPI programming Running MPI program Architecture of MPICH Lecture 2: Part II Message Passing Programming: MPI