22
Load Balancing Hybrid Programming Load Balancing Hybrid Programming Models for SMP Clusters and Fully Models for SMP Clusters and Fully Permutable Loops Permutable Loops Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens Computing Systems Laboratory

Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

  • Upload
    zeki

  • View
    28

  • Download
    0

Embed Size (px)

DESCRIPTION

Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops. Nikolaos Drosinos and Nectarios Koziris National Technical University of Athens Computing Systems Laboratory {ndros,nkoziris}@cslab.ece.ntua.gr www.cslab.ece.ntua.gr. Motivation. - PowerPoint PPT Presentation

Citation preview

Page 1: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Load Balancing Hybrid Programming Load Balancing Hybrid Programming Models for SMP Clusters and Fully Models for SMP Clusters and Fully

Permutable LoopsPermutable Loops

Nikolaos Drosinos and Nectarios Koziris

National Technical University

of Athens

Computing Systems

Laboratory

{ndros,nkoziris}@cslab.ece.ntua.grwww.cslab.ece.ntua.gr

Page 2: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 2

MotivationMotivation

fully permutable loops always a computational challenge for HPC hybrid parallelization attractive for DSM architectures currently, popular free message passing libraries provide limited multi-threading support SPMD hybrid parallelization suffers from intrinsic load imbalance

Page 3: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 3

ContributionContribution

two static thread load balancing schemes (constant-variable) for coarse-grain funneled hybrid parallelization of fully permutable loops

• generic• simple to implement

experimental evaluation against micro-kernel benchmarks of different programming models

• message passing• fine-grain hybrid• coarse-grain hybrid (unbalanced, balanced)

Page 4: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 4

Algorithmic modelAlgorithmic model

foracross tile1 do

foracross tileN do

for tilen-1 do

Receive(tile);

Compute(A,tile);

Send(tile);

Restrictions: fully permutable loops unitary inter-process dependencies

Page 5: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 5

Message passing Message passing parallelizationparallelization

tiling transformation (overlapped?) computation and communication phases pipelined execution

portable scalable highly optimized

Page 6: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 6

Hybrid parallelizationHybrid parallelization

So… why bother?

Page 7: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 7

Hybrid parallelization: why Hybrid parallelization: why bother Ibother I

shared memory programming model vs message passing programming model for shared memory architecture

Page 8: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 8

Hybrid parallelization: why Hybrid parallelization: why bother IIbother II

DSM architectures are popular!

Page 9: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 9

Fine-grain hybrid Fine-grain hybrid parallelizationparallelization

incremental parallelization of loops relatively easy to implement popular

Amdahl’s law restricts parallel efficiency overhead of thread structures re-initialization restrictive programming model for many applications

Page 10: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 10

Coarse-grain hybrid Coarse-grain hybrid parallelizationparallelization

generic SPMD programming style good parallelization efficiency no thread re-initialization overhead

more difficult to implement intrinsic load imbalance assuming common funneled thread support level

Page 11: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 11

MPI thread support levelsMPI thread support levels

single masteronly funneled serialized multiple

fine-grain hybrid

coarse-grain hybrid

comm

comp

comp

comp

comm…

comm

comp

comp

…comp

Page 12: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 12

Load balancingLoad balancing

Idea

Consequencemaster thread assumes a smaller fraction of the process tile computational load compared to other threads

othercomp

mastercomm

mastercomp ttt

Page 13: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 13

Load balancing (2)Load balancing (2)

T………total number of threadsp………current process id

1

1,

,

11

N

Cdir

dirdircomm

tilecomp

p

tt

Tbal

datastartupcomm

compcomp

txtxt

txxtAssuming

It follows

Page 14: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 14

Load balancing (3)Load balancing (3)

X1

X2

87% 87% 87% 92%

95% 95% 95% 100%

Z

thread 0 thread 1process (0,0)

process (3,1)

Page 15: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 15

Experimental ResultsExperimental Results

8-node dual SMP Linux Cluster (800 MHz PIII, 256 MB RAM, kernel 2.4.26) MPICH v.1.2.6 (--with-device=ch_p4, --with-comm=shared, P4_SOCKBUFSIZE=104KB) Intel C++ compiler 8.1 (-O3 -static

-mcpu=pentiumpro) FastEthernet interconnection network

Page 16: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 16

Alternating Direction Implicit Alternating Direction Implicit (ADI)(ADI)

Stencil computation used for solving partial differential equations Unitary data dependencies 3D iteration space (X x Y x Z)

X

Y

Z

Seque

ntial

Exe

cutio

nProcessor Mapping

DataDependencies

Page 17: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 17

ADIADI

Page 18: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 18

Synthetic benchmarkSynthetic benchmark

Page 19: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 19

ConclusionsConclusions

fine-grain hybrid parallelization inefficient unbalanced coarse-grain hybrid parallelization also inefficient balancing improves hybrid model performance variable balanced coarse-grain hybrid model most efficient approach overall relative performance improvement increases for higher communication vs computation needs

Page 20: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 20

Thank You!Thank You!

Questions?

Page 21: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 21

ADIADI

Page 22: Load Balancing Hybrid Programming Models for SMP Clusters and Fully Permutable Loops

Oslo, June 15, 2005 ICPP-HPSEC 2005 22

Synthetic benchmarkSynthetic benchmark