Introducon to MPI Programming – Part 1 · Introducon to MPI Programming – Part 1 Wei Feinstein, Le Yan HPC@LSU 5/30/2016 LONI Parallel Programming Workshop 2016 1 ... ple nodes:

Introduc)ontoMPIProgramming–Part1

WeiFeinstein,LeYan

HPC@LSU

5/30/2016 LONIParallelProgrammingWorkshop2016 1

Outline

•  Introduc)on•  MPIprogrambasics•  Point-to-pointcommunica)on


WhyParallelCompu)ng

Ascompu)ngtasksgetlargerandlarger,mayneedtoenlistmorecomputerresources•  Bigger:morememoryandstorage•  Faster:eachprocessorisfaster•  More:domanycomputa)onssimultaneously


Memorysystemmodelsforparallelcompu)ng

Differentwaysofsharingdataamongprocessors– SharedMemory– DistributedMemory– Othermemorymodels

•  Hybridmodel•  PGAS(Par))onedGlobalAddressSpace)


Sharedmemorymodel•  Allthreadscanaccesstheglobaladdressspace

•  Datasharingachievedviawri)ngto/readingfromthesamememoryloca)on

•  Example:OpenMP

C C C C

Mdata


Distributedmemorymodel

•  Datasharingachievedviaexplicitmessagepassing(throughnetwork)

•  Example:MPI(MessagePassingInterface)

Nodeinterconnect

C

M M M M C C C

•  EachprocesshasitsownaddressspaceDataislocaltoeachprocess

data


MPIProgrammingModels

•  Distributed

•  Distributed+shared


MessagePassingAnydatatobesharedmustbeexplicitlytransferredfromonetoanother

i

m

k

l

Communication medium: concrete

network,…


Entities: MPI processes

j

WhyMPI?•  Therearealreadynetworkcommunica)onlibraries•  Op)mizedforperformance•  Takeadvantageoffasternetworktransport

•  Sharedmemory(withinanode)•  Fasterclusterinterconnects(e.g.InfiniBand)•  TCP/IPifallelsefails

•  Enforcescertainguarantees•  Reliablemessages•  In-ordermessagearrival

•  Designedformul)-nodetechnicalcompu)ng


MPIHistory

•  1980-1990

•  1994:MPI-1•  1998:MPI-2

•  2012:MPI-3


MessagePassingInterface•  MPIdefinesastandardAPIformessagepassing

–  Thestandardincludes•  Whatfunc)onsareavailable•  Thesyntaxofthosefunc)ons•  Whattheexpectedoutcomeiswhencallingthosefunc)ons

–  ThestandarddoesNOTinclude•  Implementa)ondetails(e.g.howthedatatransferoccurs)•  Run)medetails(e.g.howmanyprocessesthecoderunwithetc.)

•  MPIprovidesC/C++andFortranbindings


VariousMPIImplementa)ons•  OpenMPI:opensource,portabilityandsimpleinstalla)onandconfig

•  MPICH:opensource,portable•  MVAPICH2:MPICHderiva)veInfiniBand,iWARPandotherRDMA-enabled interconnects(GPUs)

•  IntelMPI(IMPI):vendor-supportedMPICHfromIntel


Highorlowlevelprogramming?

•  Highlevelcomparedtoothernetworklibraries•  Abstracttransportlayer•  Supplyhigher-levelopera)ons

•  Lowlevelforscien)sts•  Handleproblemdecomposi)on•  Manuallywritecodeforeverycommunica)onsamongprocesses


MoreaboutMPI

•  MPIprovidesinterfacetolibraries•  APIsandconstants•  BindingtoFortran/C•  Severalthird-partybindingsforPython,Randmoreotherlanguages

•  RunMPIprograms(e.g.mpiexec)


Let’stryit•  $whoami

•  $mpiexec –np 4 whoami


Whatjusthappened?•  mpiexeclaunched4processes•  Eachprocessran`whoami`•  Eachranindependently•  UsuallylaunchnomoreMPIprocessesthan#processors

•  Usemul)plenodes:mpiexec –hostfile machine.lst –np/-npp 4 app.exe


OutlineofaMPIProgram1.  Ini)alizecommunica)ons

MPI_INITini)alizestheMPIenvironmentMPI_COMM_SIZEreturnsthenumberofprocessesMPI_COMM_RANKreturnsthisprocess’sindex(rank)

2.  Communicatetosharedatabetweenprocesses

MPI_SENDsendsamessageMPI_RECVreceivesamessage

3. Exitina“clean”fashionwhenMPIcommunica)onisdoneMPI_FINALIZE


HelloWorld(C)HeaderfileIni)aliza)onComputa)onandcommunica)on

Termina)on

include “mpi.h”

int main(int argc, char* argv[]){int nprocs, myid;MPI_Status status;

MPI_Init(&argc, &argv);MPI_Comm_size(MPI_COMM_WORLD, &nprocs); MPI_Comm_rank(MPI_COMM_WORLD, &myid);printf("Hello World from process %d/%d \n", myid, nprocs);

MPI_Finalize();…}


include “mpi.h”

int main(int argc, char* argv[]){int nprocs, myid;MPI_Status status;

MPI_Init(&argc, &argv);MPI_Comm_size(MPI_COMM_WORLD, &nprocs); MPI_Comm_rank(MPI_COMM_WORLD, &myid);printf("Hello World from process %d/%d \n", myid, nprocs);

MPI_Finalize();…}

HelloWorld(C)

5/30/2016 LONIParallelProgrammingWorkshop2016

HeaderfileIni)aliza)onComputa)onandcommunica)on

Termina)on

19

[wfeinste@shelob1hello]$mpicchello.c[wfeinste@shelob1hello]$mpirun-np4./a.outHelloWorldfromprocess3/4HelloWorldfromprocess0/4HelloWorldfromprocess2/4HelloWorldfromprocess1/4

HelloWorld(Fortran)include"mpif.h"integer::nprocs,ierr,myidinteger::status(mpi_status_size)callmpi_init(ierr)callmpi_comm_size(mpi_comm_world,nprocs,ierr)callmpi_comm_rank(mpi_comm_world,myid,ierr)write(*,'("HelloWorldfromprocess",I3,"/",I3)')myid,nprocscallmpi_finalize(ierr)

….

HeaderfileIni)aliza)on

Computa)onandcommunica)on

Termina)on


include"mpif.h"integer::nprocs,ierr,myidinteger::status(mpi_status_size)callmpi_init(ierr)callmpi_comm_size(mpi_comm_world,nprocs,ierr)callmpi_comm_rank(mpi_comm_world,myid,ierr)write(*,'("HelloWorldfromprocess",I3,"/",I3)')myid,nprocscallmpi_finalize(ierr)

….

HeaderfileIni)aliza)on

Computa)onandcommunica)on

Termina)on

[wfeinste@shelob1hello]$mpif90hello.f90[wfeinste@shelob1hello]$mpirun-np4./a.outHelloWorldfromprocess3/4HelloWorldfromprocess0/4HelloWorldfromprocess1/4HelloWorldfromprocess2/4

HelloWorld(Fortran)


•  Func)onnameconven)on–  C:MPI_Xxxx(arg1,…) –  Fortran:mpi_xxx (notcasesensi)ve)

•  ErrorhandlesIfrc/ierr==MPI_SUCCESS,thenthecallissuccessful.

•  C:int rc = MPI_Xxxx(arg1,…) •  Fortran:call mpi_some_function(arg1,…,ierr)

NamingSignature(C/Fortran)


Communicators(1)


•  Acommunicatorisaniden)fierassociatedwithagroupofprocesses

MPI_Comm_size(MPI_Com MPI_COMM_WORLD, int &nprocs) MPI_Comm_rank(MPI_Com MPI_COMM_WORLD, int &myid)

Communicators(2)•  Acommunicatorisaniden)fierassociatedwithagroupofprocesses–  Canberegardedasthenamegiventoanorderedlistofprocesses

–  Eachprocesshasauniquerank,whichstartsfrom0(usuallyreferredtoas“root”)

–  ItisthecontextofMPIcommunica)onsandopera)ons•  Forinstance,whenafunc)oniscalledtosenddatatoallprocesses,MPIneedstounderstandwhat“all”means


Communicators(3)•  MPI_COMM_WORLD:thedefaultcommunicatorcontainsallprocessesrunningaMPIprogram

•  Therecanbemanycommunicatorse.g., MPI_Comm_split(MPI_Comm comm, int color, int, kye, MPI_Comm* newcomm)

•  Aprocesscanbelongtomul)plecommunicators–  Therankisusuallydifferent


CommunicatorInforma)on•  Rank:uniqueidofeachprocess

– C:MPI_Comm_Rank(MPI_Comm comm, int *rank)

– Fortran:MPI_COMM_RANK(COMM,RANK,ERR) •  Getthesize/processesofacommunicator

comm, int – C:MPI_Comm_Size(MPI_Comm *size)

– Fortran:MPI_COMM_SIZE(COMM,SIZE,ERR)


CompilingMPIPrograms•  Notapartofthestandard

–  Couldvaryfromplavormtoplavorm–  Orevenfromimplementa)ontoimplementa)ononthesameplavorm

–  mpicc/mpicxx/mpif77/mpif90:wrapperstocompileMPIcodeandautolinktostartupandmessagepassinglibraries.


MPICompilers


Language ScriptName UnderlyingCompiler

C mpicc gcc

mpiicc icc

mpipgcc pgcc

C++ mpiCC g++

mpiicpc icpc

mpipgCC pgCC

Fortran mpif90 f90

mpigfortran gfortran

mpiifort ifort

mpipgf90 pgf90

CompilingandRunningMPIPrograms•  OnShelob:

–  Compile•  C:mpicc –o <executable name> <source file> •  Fortran:mpif90 –o <executable name> <source file>

–  Run•  mpirun –hostfile $PBS_NODEFILE –np <number of procs> <executable name> <input parameters>


AboutExercises•  Exercises

–  Tracka:Processcolor–  Trackb:Matrixmul)plica)on–  Trackc:Laplacesolver

•  Yourtasks:•  FillinblankstomakeMPIprogramsworkunder

/exercisedirectory•  Solu)onsareprovidedin/solu)ondirectory


Exercisea1:ProcessColor

•  WriteaMPIprogramwhere– Processeswithoddrankprinttoscreen“Processxhasthecolorgreen”

– Processeswithevenrankprinttoscreen“Processxhasthecolorred”


Exerciseb1:MatrixMul)plica)onAB C

C1,1=Σ(A1,i×Bi,1)i=1

i=n


Exerciseb1:MatrixMul)plica)on

for(i=0;i<row;i++){ //rowoffirstmatrix for(j=0;j<col;j++){ //columnofsecondmatrix sum=0; for(k=0;k<n;k++) sum=sum+a[i][k]*b[k][j]; c[i][j]=sum; //finalmatrix }}


Exerciseb1:MatrixMul)plica)on•  Goal:Distributetheworkloadamongprocessesin1-dmanner

•  Eachprocessini)alizesitsowncopyofAandB•  Thenprocessespartoftheworkload

•  Needtodeterminehowtodecompose(whichprocessdealswhichrowsorcolumns)

•  AssumethatthedimensionofAandBisamul)pleofthenumberofprocesses

(needtocheckthisintheprogram)•  Validatetheresultattheend


Exercisec1:LaplaceSolverversion1

Px,y=(Dx-1,y+Dx,y-1+Dx+1,y+Dx,y+1)/4



•  Goal:Distributetheworkloadamongprocessesin1-dmannere.g.4MPIprocesses(colorcoded)tosharetheworkload





•  Goal:Distributetheworkloadamongprocessesin1-dmanner– Findoutthesizeofsub-matrixforeachprocess– Leteachprocessreportwhichpartofthedomainitwillworkon,e.g.“Processxwillprocesscolumn(row)xthroughcolumn(row)y.”

•  Row-wise(C)orcolumn-wise(Fortran)


MPIFunc)ons•  Environmentmanagementfunc)ons

–  Ini)aliza)onandtermina)on•  Point-to-pointcommunica)onfunc)ons

– Messagetransferfromoneprocesstoanother•  Collec)vecommunica)onfunc)ons

– Messagetransferinvolvingallprocessesinacommunicator


Point-to-pointCommunica)on


Point-to-pointCommunica)on•  Blockingsend/receive

–  ThesendingprocesscallstheMPI_SENDfunc)on•  C:int MPI_Send(void *buf, int count, MPI_Datatype dtype, int dest, int tag, MPI_Comm comm);

•  Fortran:MPI_SEND(BUF, COUNT, DTYPE, DEST, TAG, COMM, IERR)

–  ThereceivingprocesscallstheMPI_RECVfunc)on•  C:int MPI_Recv(void *buf, int count, MPI_Datatype dtype, int source, int tag, MPI_Comm comm, MPI_Status *status);

•  Fortran:MPI_RECV(BUF, COUNT, DTYPE, SOURCE, TAG, COMM, STATUS, IERR)


int MPI_Send(void *buf, int count, MPI_Datatype dtype, int dest, int tag, MPI_Comm comm); int MPI_Recv(void *buf, int count, MPI_Datatype dtype, int source, int tag, MPI_Comm comm, MPI_Status *status)

•  AMPImessageconsistsoftwoparts– Messageitself:databody– Messageenvelope:rou)nginfo

•  status: informa)onofthemessagethatisreceived

Send/Receive


Example:GatheringArrayData•  Gathersomearraydatafromeachprocessandplaceitinthememoryoftherootprocess

P0 0 1 P1 2 3 P2 4 5 P3 6 7

P0 0 1 2 3 4 5 6 7

43


Example:GatheringArrayData… integer,allocatable :: array(:) ! Initialize MPI call call call

mpi_init(ierr) mpi_comm_size(mpi_comm_world,nprocs,ierr) mpi_comm_rank(mpi_comm_world,myid,ierr)

! Initialize the array allocate(array(2*nprocs)) array(1)=2*myid array(2)=2*myid+1 ! Send data to the root process if (myid.eq.0) then do i=1,nprocs-1

call mpi_recv(array(2*i+1),2,mpi_integer,i, 0,mpi_comm_world,status,ierr) enddo write(*,*) “The content of the array:” write(*,*) array

else call mpi_send(array,2,mpi_integer,0,0, mpi_comm_world,ierr) endif


BlockingOpera)ons

•  MPI_SENDandMPI_RECVareblockingopera)ons

– Theywillnotreturnfromthefunc)oncallun)lthecommunica)oniscompleted

– Whenablockingsendreturns,thevalue(s)storedinthevariablecanbesafelyoverwriWen

– Whenablockingreceivereturns,thedatahasbeenreceivedandisreadytobeused


Deadlock(1)Deadlockoccurswhenbothprocessesawaitstheothertomakeprogress

// Exchange data between two processes If (process 0)

Receive data from process 1 Send data to process 1

If (process 1) Receive data from process 0 Send data to process 0

•  Guaranteeddeadlock!•  Bothreceiveswaitfordata,butnosendcanbe

calledun)lthereceivereturns


Deadlock(2)•  Howaboutthisone?


Receive data from process 1 Send data to process 1

If (process 1) Send data to process 0 Receive data from process 0

•  Nodeadlock!•  P0receivesthedatafirst,thensendsthedatatoP1•  Therewillbeperformancepenaltydueto

serializa)onofpoten)allyconcurrentopera)ons.


Deadlock(3)•  Andthisone?


Send data to process 1 Receive data from process 1

If (process 1) Send data to process 0 Receive data from process 0

•  Itdepends•  Ifonesendreturns,thenweareOKAY-mostMPI

implementa)onsbufferthemessage,soasendcouldreturnevenbeforethematchingreceiveisposted.

•  Ifthemessageistoolargetobebuffered,deadlockwilloccur.


Blockingvs.Non-blocking•  Blockingopera)onsaredatacorrup)onproof,but

–  Possibledeadlock–  Performancepenalty

•  Non-blockingopera)onsallowoverlapofcomple)onandcomputa)on–  Theprocesscanworkonothertasksbetweentheini)aliza)onandcomple)on

–  Shouldbeusedwheneverpossible


Non-blockingOpera)ons(asynchronous)

•  Separateini)aliza)onofasendorreceivefromitscomple)on

•  Twocallsarerequiredtocompleteasendorreceive–  Ini)aliza)on

•  Send:MPI_ISEND •  Receive:MPI_IRECV

–  Comple)on:MPI_WAIT


Non-blockingPoint-to-pointCommunica)on•  MPI_ISENDfunc)on

•  C:int MPI_Isend(void *buf, int count, MPI_Datatype dtype, int dest, int tag, MPI_Comm comm, MPI_Request *request)

•  Fortran:MPI_ISEND(BUF, COUNT, DTYPE, DEST, TAG, COMM, REQ, IERR)

•  MPI_IRECVfunc)on•  C:int MPI_Irecv(void *buf, int count, MPI_Datatype

dtype, int source, int tag, MPI_Comm comm, MPI_Request *request)

•  Fortran: MPI_IRECV(BUF, COUNT, DTYPE, SOURCE, TAG, COMM, REQ, IERR)

•  MPI_WAIT•  C:int MPI_Wait( MPI_Request *request, MPI_Status *status);

•  Fortran:MPI_WAIT(REQUEST, STATUS, IERR)


Example:ExchangeDatawithNon-blockingcalls

integer reqids,reqidr integer status(mpi_status_size) if (myid.eq.0) then

mpi_isend(to_p1,n,mpi_integer,1,100,mpi_comm_world,reqids,ierr) mpi_irecv(from_p1,n,mpi_integer,1,101,mpi_comm_world,reqidr,ierr) （myid.eq.1) then mpi_isend(to_p0,n,mpi_integer,0,101,mpi_comm_world,reqids,ierr) mpi_irecv(from_p0,n,mpi_integer,0,100,mpi_comm_world,reqidr,ierr)

call call

elseif call call

endif

call mpi_wait(status,reqids,ierr) call mpi_wait(status,reqidr,ierr)


Exercisea2:FindGlobalMaximum•  Goal:Findthemaximuminanarray

•  Eachprocesshandlepartofthearray•  Everyprocessneedstoknowthemaximumatthe

endofprogram•  Hints

•  Step1:eachprocesssendthelocalmaximumtotherootprocesstofindtheglobalmaximum

•  Step2:therootprocesssendtheglobalmaximumtoallotherprocesses


Exerciseb2:MatrixMul)plica)on

•  Modifyb1sothateachprocesssendsitspar)alresultstotherootprocess– Therootprocessshouldhavethewholematrix

•  Validatetheresultattherootprocess


Exercisec2:LaplaceSolver

•  Goal:developaworkingMPILaplacesolverusingc1– Distributetheworkloadin1Dmanner–  Ini)alizethesub-matrixateachprocessandsettheboundaryvalues

– Attheendofeachitera)on•  Exchangeboundarydatawithneighbors•  Findtheglobalconvergenceerroranddistributetoallprocesses


WhyMPI?•  Standardized

–  Witheffortstokeepitevolving(MPI3.0)•  Portability

–  MPIimplementa)onsareavailableonalmostallplavorms•  Scalability

–  Inthesensethatitisnotlimitedbythenumberofprocessorsthatcanaccessthesamememoryspace

•  Popularity–  DeFactoprogrammingmodelfordistributedmemorymachines

•  Nearlyeverybigacademicorcommercialsimula)onordataanalysisrunningonmul)plenodesusesMPIdirectlyorindirectly


Con)nue…•  MPIPart2:Collec)vecommunica)ons•  MPIPart3:UnderstandingMPIapplica)ons


Documents

Introducon to MPI Programming – Part 1 · Introducon to MPI Programming – Part 1 Wei Feinstein, Le Yan HPC@LSU 5/30/2016 LONI Parallel Programming Workshop 2016 1 ... ple nodes: