44
Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts Non-Blocking Collectives for MPI-2 – overlap at the highest level – Torsten Höfler Department of Computer Science Indiana University / Technical University of Chemnitz Commissariat à l’Énergie Atomique Direction des applications militaires (CEA-DAM) Bruyéres-le-chatel, France 18th January 2007

Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Embed Size (px)

Citation preview

Page 1: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Non-Blocking Collectives for MPI-2– overlap at the highest level –

Torsten Höfler

Department of Computer ScienceIndiana University / Technical University of Chemnitz

Commissariat à l’Énergie AtomiqueDirection des applications militaires (CEA-DAM)

Bruyéres-le-chatel, France18th January 2007

Page 2: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Outline

1 Some Considerations about Interconnects

2 Why Non blocking Collectives?

3 LibNBC

4 And Applications?

5 Ongoing Efforts

Page 3: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Outline

1 Some Considerations about Interconnects

2 Why Non blocking Collectives?

3 LibNBC

4 And Applications?

5 Ongoing Efforts

Page 4: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

The LogGP Model

CPU

NetworkL

level

time

Sender Receivero s or

g + m*G g + m*G

Page 5: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Interconnect Trends

Technology Change

modern interconnects have co-processors (Quadrics,InfiniBand, Myrinet)TCP/IP is optimized for lower host-overhead (see our work)Ethernet protocol offloadL + g + m · G >> o

⇒ we prove our expectations with benchmarks of the user CPUoverhead

Page 6: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

LogGP Model Examples - TCP

0

100

200

300

400

500

600

0 10000 20000 30000 40000 50000 60000

Tim

e in

mic

rose

cond

s

Datasize in bytes (s)

MPICH2 - G*s+gMPICH2 - o

TCP - G*s+gTCP o

Page 7: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

LogGP Model Examples - Myrinet/GM

0

50

100

150

200

250

300

350

0 10000 20000 30000 40000 50000 60000

Tim

e in

mic

rose

cond

s

Datasize in bytes (s)

Open MPI - G*s+gOpen MPI - o

Myrinet/GM - G*s+gMyrinet/GM - o

Page 8: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

LogGP Model Examples - InfiniBand/OpenIB

0

10

20

30

40

50

60

70

80

90

0 10000 20000 30000 40000 50000 60000

Tim

e in

mic

rose

cond

s

Datasize in bytes (s)

Open MPI - G*s+gOpen MPI - o

OpenIB - G*s+gOpenIB - o

Page 9: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Literature

[1] T. Hoefler, A. LICHEI, AND W. REHM: Low-Overhead LogGPParameter Assessment for Modern Interconnection Networks.Accepted at PMEO-PDS 2007 in conjunction with IPDPS 2007

[2] T. Hoefler, J. SQUYRES, G. FAGG, G. BOSILCA, W. REHM AND

A. LUMSDAINE: A New Approach to MPI Collective CommunicationImplementations. In proceedings of the 6th Austrian-HungarianWorkshop on Distributed and Parallel Systems

Page 10: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Outline

1 Some Considerations about Interconnects

2 Why Non blocking Collectives?

3 LibNBC

4 And Applications?

5 Ongoing Efforts

Page 11: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Modelling the Benefits

LogGP Models - general

tbarr = (2o + L) · dlog2Pe

tallred = 2 · (2o + L + m · G) · dlog2Pe + m · γ · dlog2Pe

tbcast = (2o + L + m · G) · dlog2Pe

CPU and Network LogGP parts

tCPUbarr = 2o · dlog2Pe tNET

barr = L · dlog2Pe

tCPUallred = (4o + m · γ) · dlog2Pe tNET

allred = 2 · (L + m · G) · dlog2Pe

tCPUbcast = 2o · dlog2Pe tNET

bcast = (L + m · G) · dlog2Pe

Page 12: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

User Overhead Benchmarks

LAM/MPI 7.1.2

10 20 30 40 50 60Communicator Size 1 10

100 1000

10000 100000

Data Size

0 0.005 0.01

0.015 0.02

0.025 0.03

CPU Usage (share)

Page 13: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

User Overhead Benchmarks

MPICH2 1.0.3

10 20 30 40 50 60Communicator Size 1 10

100 1000

10000 100000

Data Size

0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08

CPU Usage (share)

Page 14: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Send/Recv is there - Why Collectives?

Gorlach, ’04: ”Send-Receive Considered Harmful”⇔ Dijkstra, ’68: ”Go To Statement Considered Harmful”

point to pointif ( rank == 0) then

do i=1,pcall MPI_SEND(...)

enddoelse

call MPI_RECV(...)endif

vs. collectivecall MPI_SCATTER(...)

cmp. math libraries vs. loops

Page 15: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Putting Everything Together

non blocking collectives?JoD mentions ”split collectives”example:

MPI_Bcast_begin(...)MPI_Bcast_end(...)

no nesting with other collsvery limitednot in the MPI-2 standardvotes: 11 yes, 12 no, 2 abstain

Page 16: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Performance Benefits

overlap

leverage hardware parallelism (e.g. InfiniBandTM)overlap similar to non-blocking point-to-point

pseudo synchronizationavoidance of explicit pseudo synchronizationlimit the influence of OS noise

Page 17: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Process Skew

caused by OS interference or unbalanced applicationespecially if processors are overloadedworse for big systemscan cause dramatic performance decreaseall nodes wait for the last

ExamplePetrini et. al. (2003) ”The Case of the Missing SupercomputerPerformance: Achieving Optimal Performance on the 8,192Processors of ASCI Q”

Page 18: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Process Skew

caused by OS interference or unbalanced applicationespecially if processors are overloadedworse for big systemscan cause dramatic performance decreaseall nodes wait for the last

ExamplePetrini et. al. (2003) ”The Case of the Missing SupercomputerPerformance: Achieving Optimal Performance on the 8,192Processors of ASCI Q”

Page 19: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Process Skew - MPI Example - Jumpshotpr

oces

ses

time

P0

P1

P3

P2

Page 20: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Process Skew - NBC Example - Jumpshotpr

oces

ses

time

P0

P1

P3

P2

Page 21: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Literature

[3] T. Hoefler, J. SQUYRES, W. REHM, AND A. LUMSDAINE: A Casefor Non-Blocking Collective Operations. In Frontiers of HighPerformance Computing and Networking, pages 155-164, SpringerBerlin / Heidelberg, ISBN: 978-3-540-49860-5 Dec. 2006

[4] T. Hoefler, J. SQUYRES, G. BOSILCA, G. FAGG, A. LUMSDAINE,AND W. REHM: Non-Blocking Collective Operations for MPI-2. OpenSystems Lab, Indiana University. presented in Bloomington, IN, USA,School of Informatics, Aug. 2006

Page 22: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Outline

1 Some Considerations about Interconnects

2 Why Non blocking Collectives?

3 LibNBC

4 And Applications?

5 Ongoing Efforts

Page 23: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Non-Blocking Collectives - Interface

extension to MPI-2”mixture” between non-blocking ptp and collectivesuses MPI_Requests and MPI_Test/MPI_Wait

MPI_Ibcast(buf1, p, MPI_INT, 0, MPI_COMM_WORLD, &req);MPI_Wait(&req);

ProposalHoefler et. al. (2006): ”Non-Blocking Collective Operations forMPI-2”

Page 24: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Non-Blocking Collectives - Interface

extension to MPI-2”mixture” between non-blocking ptp and collectivesuses MPI_Requests and MPI_Test/MPI_Wait

MPI_Ibcast(buf1, p, MPI_INT, 0, MPI_COMM_WORLD, &req);MPI_Wait(&req);

ProposalHoefler et. al. (2006): ”Non-Blocking Collective Operations forMPI-2”

Page 25: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Non-Blocking Collectives - Implementation

implementation available with LibNBCwritten in ANSI-C and uses only MPI-1central element: collective schedulea coll-algorithm can be represented as a scheduletrivial addition of new algorithms

Example: dissemination barrier, 4 nodes, node 0:

send to 1 recv from 3 end send to 2 recv from 2 end

LibNBC download: http://www.unixer.de/NBC

Page 26: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

LibNBC Benchmarks - Gather withInfiniBand/MVAPICH on 64 nodes

0

5000

10000

15000

20000

25000

30000

0 50000 100000 150000 200000 250000 300000

Run

time

(s)

Datasize (bytes)

MPI_GatherNBC_Igather

Page 27: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

LibNBC Benchmarks - Scatter withInfiniBand/MVAPICH on 64 nodes

0

5000

10000

15000

20000

25000

30000

0 50000 100000 150000 200000 250000 300000

Run

time

(s)

Datasize (bytes)

MPI_ScatterNBC_Iscatter

Page 28: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

LibNBC Benchmarks - Alltoall withInfiniBand/MVAPICH on 64 nodes

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

0 50000 100000 150000 200000 250000 300000

Run

time

(s)

Datasize (bytes)

MPI_AlltoallNBC_Ialltoall

Page 29: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

LibNBC Benchmarks - Allreduce withInfiniBand/MVAPICH on 64 nodes

0

10000

20000

30000

40000

50000

60000

0 50000 100000 150000 200000 250000 300000

Run

time

(s)

Datasize (bytes)

MPI_AllreduceNBC_Iallreduce

Page 30: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Literature

[5] T. Hoefler AND A. LUMSDAINE: Design, Implementation, andUsage of LibNBC. Open Systems Lab, Indiana University. presentedin Bloomington, IN, USA, School of Informatics, Aug. 2006

[6] T. Hoefler, A. LUMSDAINE AND W. REHM: Implementation andPerformance Analysis of Non-Blocking Collective Operations for MPI.Under submission (ask me for a copy)

Page 31: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Outline

1 Some Considerations about Interconnects

2 Why Non blocking Collectives?

3 LibNBC

4 And Applications?

5 Ongoing Efforts

Page 32: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Pseudocode Overlap with Double Buffering

do ....! send communication bufferMPI_Ialltoall(buffer_comm, ...,, request)

! do useful workdo_work(buffer_work)

! finish communicationMPI_Wait(request, ...)

! swap the buffersbuffer_tmp = buffer_commbuffer_comm = buffer_workbuffer_work = buffer_tmp

enddo

Page 33: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Linear Solvers - Domain Decomposition

First ExampleNaturally Independent Computation - Linear Solvers

iterative linear solvers are used in many scientific kernelsoften used operation is vector-matrix-multiplymatrix is domain-decomposed (e.g., 3D)only outer (border) elements need to be communicatedcan be overlapped

Page 34: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Domain Decomposition

nearest neighbor communicationcan be implemented with MPI_Alltoallv

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������

������������������������������������������������������������������������������

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �

��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

��������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

���������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������������

������������������������������������������������������������������������������������������������������������������������������������������������������������

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������

�������������������������������������������������������������������������������������������������������������������������������������������������������������������������

���

��� Process−local data

Halo−data2D Domain

P1 P2 P3

P4 P5 P6 P7

P10P9P8 P11

P0

Page 35: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Parallel Speedup (Best Case)

0

20

40

60

80

100

8 16 24 32 40 48 56 64 72 80 88 96

Spe

edup

Number of CPUs

Eth blockingEth non-blocking

0

20

40

60

80

100

8 16 24 32 40 48 56 64 72 80 88 96S

peed

upNumber of CPUs

IB blockingIB non-blocking

Cluster: 128 2 GHz Opteron 246 nodesInterconnect: Gigabit Ethernet, InfiniBandTM

System size 800x800x800 (1 node ≈ 5300s)

Page 36: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Parallel Gain with Non-Blocking Communication

gain = tnbl/tbl

-0.2

-0.1

0

0.1

0.2

0.3

0.4

0 8 16 24 32 40 48 56 64 72 80 88 96

Rel

ativ

e S

peed

up

Number of CPUs

IBEth

Page 37: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Linear Solvers - Domain Decomposition

Second ExampleData Parallel Loops - Parallel Compression

automatic transformations (C++ templates), typical loopstructure:

for (i=0; i < N/P; i++) {compute(i);

}comm(N/P);

Page 38: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Parallel Speedup (Best Case)

0

10

20

30

40

50

60

70

80

90

0 10 20 30 40 50 60 70 80 90 100

Spe

edup

# Processors

MPI/blockingNBC/pipe

NBC/tileNBC/wintile

0

10

20

30

40

50

60

70

80

90

0 10 20 30 40 50 60 70 80 90 100S

peed

up# Processors

MPI/blockingNBC/pipe

NBC/tileNBC/wintile

Cluster: 64 2 GHz Opteron 246 nodesInterconnect: Gigabit Ethernet, InfiniBandTM

System size 64*50 MB

Page 39: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Communication Overhead

MVAPICH 0.9.4

0

5

10

15

20

25

30

35

0 10 20 30 40 50 60 70 80 90 100

Com

mun

icat

ion

Ove

rhea

d

# Processors

MPI/blockingNBC/pipe

NBC/tileNBC/wintile

Page 40: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Literature

[7] T. Hoefler P. GOTTSCHLING, W. REHM AND A. LUMSDAINE:Optimizing a Conjugate Gradient Solver with Non-Blocking CollectiveOperations. In 13th European PVM/MPI User’s Group Meeting,Proceedings, LNCS 4192, presented in Bonn, Germany, pages374-382, Springer, ISSN: 0302-9743, ISBN: 3-540-39110-X Sep.2006

[8] T. Hoefler, P. GOTTSCHLING AND A. LUMSDAINE:Transformations for enabling non-blocking collective communication inhigh-performance applications. Under submission (ask me for a copy)

Page 41: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Outline

1 Some Considerations about Interconnects

2 Why Non blocking Collectives?

3 LibNBC

4 And Applications?

5 Ongoing Efforts

Page 42: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Ongoing Work

3D-FFToptimized version of 3D-FFT with full overlap or pipelinedstill in development (ABINIT version)very promising

LOBPCG Method in ABINITdeveloped by G. Zérah (CEA)could use NBC for parallel matrix-matrix multiplication

LibNBCFortran bindings for LibNBCoptimized collectives

Page 43: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Ongoing Work (continued)

Collective Communication

optimized collectives for InfiniBandTM

using special hardware support

Network Modelling

refined LogGP model parametrizationmodelling of collective algorithms

Collaborate with Application Engineers

I want to help to apply NBC to your problem!

Page 44: Non-Blocking Collectives for MPI-2 - ETH Zhtor.inf.ethz.ch/publications/img/cea-nbcoll.pdf · Non-Blocking Collectives for MPI-2 ... MPICH2 - o TCP - G*s+g TCP o. Some Considerationsabout

Some Considerations about Interconnects Why Non blocking Collectives? LibNBC And Applications? Ongoing Efforts

Discussion

THE ENDQuestions?

Thank you for your attention!