Heterogeneous and Grid Computing
2
Communication models
Modeling the performance of communications – Huge area– Two main communities
» Network designers» HPC (network users)
– Different goals and approaches» Complex detailed models of behavior (simulation,
queues, etc. – design analysis) Performance parameters are not primary
» Simple and efficient predictive performance models
Heterogeneous and Grid Computing
3
Communication models (ctd)
HPC communication performance models – Simple and efficient
» Small number of measurable parameters (LogP)
Hardware platforms– Clusters– Local networks– Dedicated global networks– Global networks connected via Internet
Heterogeneous and Grid Computing
4
Communication models (ctd)
Communication level– Different levels
» Stack of protocols and software
– Most relevant level» The level of communication middleware used in HPC
programs (application programmer’s level) MPI, PVM
– Lower levels» For system programmers (say, MPI implementers)
Heterogeneous and Grid Computing
5
Communication models (ctd)
Objectives– Predict the execution time of
communication operations» For algorithm and program design
– Optimization of communication operations» Collective MPI communications
Heterogeneous and Grid Computing
6
Heterogeneous clusters
Most analytical predictive models used for heterogeneous clusters– Inherently homogeneous
» Originally designed for homogeneous clusters» Execution time of communication operation
only depends on Topology The number of participating processors
Heterogeneous and Grid Computing
7
Heterogeneous clusters (ctd)
Homogeneous communication models– Very simple (linear)
» Differ in formation of constant and variable parts
– Typical structure» Point-to-point communications are the basis
A small set of integral parameters having the same value for each pair of processors
» Collectives are expressed as combination of p2p’s Time analytically predicted depending on message
size and the number of processors
Heterogeneous and Grid Computing
8
Heterogeneous clusters (ctd)
Two main issues– Model design
– Efficient and accurate estimation of the model
Estimation of homogeneous models– For homogeneous clusters
» p2p parameters are found statistically From measurements of communications between any two
processors
– For heterogeneous clusters» p2p parameters are found by averaging values for all pairs
Heterogeneous and Grid Computing
9
Homogeneous models
Homogeneous analytical predictive communication models– The Hockney model– LogP– LogGP– PLogP
Heterogeneous and Grid Computing
10
The Hockney model
Time of p2p communication is α+β×m– α – the latency– β – bandwidth– m – message size
Estimation– Directly from p2p tests for different message sizes with
linear regression– Each test
» Measures the time of roundtrip Sending and receiving a message of size m, or Sending a message of size m and receiving a zero-
sized message
m
Heterogeneous and Grid Computing
11
The LogP model
The main parameters of the LogP model – L: An upper bound on the latency
» The delay, incurred in sending a message from its source processor to its target processor
– o: The overhead» The length of time that a processor is engaged in the
transmission or reception of each message; during this time the processor cannot perform other operations
=> Point-to-point communication time L2xo
m
Heterogeneous and Grid Computing
12
The LogP model
m
– g: The gap between messages» The minimum time interval between consecutive
message transmissions or consecutive message receptions at a processor
Transmission of at most L/g messages simultaneously
– P: The number of processors
Heterogeneous and Grid Computing
13
The LogGP model
LogGP – extension of LogP for messages of the arbitrary size m
– G – the gap per byte for large messages
– p2p communication time
m 2 ( 1)L o m G
2 ( 1)L o m G
Heterogeneous and Grid Computing
14
The PLogP model
PLogP – parameterized LogP– os(m) and or(m)
» Send and receive overheads» Functions of the message size
– g(m) – the gap» Function of message size» g(m)≥ os(m), g(m)≥ or(m)
p2p time : L+ os(m)+ or(m)
m 2 ( 1)L o m G
Heterogeneous and Grid Computing
15
Estimation of LogP-based models
os(m)– directly from the execution time of the send operation
sending a message of m bytes» Results of a number of experiments are averaged (~tens)
or(m)– directly from the time of receiving a message of m bytes in
the roundtrip» After completion of the send, processor i waits for some time and
only then posts a receive operation
» The execution time of the receive operation approximates or(m)
m 2 ( 1)L o m G
0
mi j
Heterogeneous and Grid Computing
16
Estimation of LogP-based models (ctd)
g(m)– Directly from the execution time sn(m) of sending without
reply a large number n of messages of size m » As , then
» n is obtained from saturation process (thousands or more)
L– From the execution time of a roundtrip sending
and receiving a messages of size 1» The time: 2×L+2×(os(1)+or(1))
m
1 1( ,..., ) ( ) ... ( )n ns m m g m g m ( ) ( ) /ng m s m n
Heterogeneous and Grid Computing
17
Estimation of LogP-based models (ctd)
m 2 ( 1)L o m G
LogP/LogGP PLogP
o (os(1)+or(1))/2
g g(1)
G g(m)/m
P P
Heterogeneous and Grid Computing
18
Homogeneous models: collectives
The homogeneous models– Used for analytical prediction of the execution
time of different algorithms of collective communications
» In applications
» In MPI implementations Optimization of collective operations upon
installation of MPI implementation
Heterogeneous and Grid Computing
19
Homogeneous models: collectives (ctd)
The traditional homogeneous models– Linear
» p2p and collective operations are linear functions of message size (except PLogP)
– Deterministic» p2p is deterministic => all collectives too
Recent results show that it is not true for many popular platforms – For example, single-switched clusters with MPI stack
including TCP/IP layer
Heterogeneous and Grid Computing
20
Homogeneous models: collectives (ctd)
Many-to-one (flat tree, as in MPI standard)
Heterogeneous and Grid Computing
21
Homogeneous models: collectives (ctd)
One-to-many (flat tree)
Heterogeneous and Grid Computing
22
Homogeneous models: collectives (ctd)
Extra parameters of a more accurate model– M1=M1(n) (gather escalations begin)
– M2=const (gather escalations stop)
– k, the number of levels of escalation
– Ti, the execution time for i-th escalation level
– fi(n, m), the probability of escalation to level i » depending on the number of involved processors, n, and the
message size, m (M1≤m≤M2)
– S, the scatter leap happens» S=M2
Heterogeneous and Grid Computing
23
Homogeneous models: collectives (ctd) Discrete constant levels of escalation (tens- and hundreds-fold) Probability of escalation to level is found
Heterogeneous and Grid Computing
24
Homogeneous models: collectives (ctd)
Application of the more accurate model– Optimization of MPI_Scatter and MPI_Gather
» Eliminating the non-determinism and non-linearity of MPI_Gather
if (M1≤m≤M2) { find N such that (m/N)<M1 and (m/(N-1))≥M1 ; for (i=0; i<; i++) { MPI_Barrier(comm); MPI_Gather(sendbuf + i*(m/N), m/N); }
}else MPI_Gather(sendbuf, m);
Heterogeneous and Grid Computing
25
Homogeneous models: collectives (ctd)
Application of the more accurate model (ctd)» Elimination of the non-linearity of MPI_Scatter
if (m>S) { find N such that (m/N)<S and (m/(N-1))≥S ; for (i=0; i<; i++) { MPI_Scatter(recvbuf + i*(m/N), m/N); }
}
else MPI_Scatter(recvbuf, m);
Heterogeneous and Grid Computing
26
Homogeneous models: collectives (ctd)
Heterogeneous and Grid Computing
27
Heterogeneous communication models
None of the traditional models reflects heterogeneity of the processors– p2p parameters average real ones– The averages are used in modelling collectives– If some processors significantly differ in
performance» The model may become quite inaccurate
More accurate models would have different p2p parameters
Heterogeneous and Grid Computing
28
Heterogeneous communication model: case study
Cluster of heterogeneous computers– Switched Ethernet network– MPI– The most common platform for heterogeneous
parallel computing Objectives
– Efficient prediction of communication cost of parallel algorithms/MPI programs
– Effective and efficient building of the model
Heterogeneous and Grid Computing
29
Heterogeneous communication model: case study ctd)
Heterogeneous point-to-point– processor parameters
- fixed delays - variable delays– link parameters
- transmission rate Parameters cannot be found from other p2p
– Hockney/LogGP: parameters are insufficient to find variable processing delays and transmission rates
– PLogP: parameters are functions of message size Design of communication experiments
– more than 2 linear parameters Minimization of the number of measurements
/ij i i j j ijT C t M C t M M
ji CC , ji tt ,
ij
Heterogeneous and Grid Computing
30
Heterogeneous communication model: case study (ctd)
One-to-many (scatter type)
SMMMtCMntC iiini
,/max 01
00
SMMMtCMntCn
iiii
,)/(1
000
Heterogeneous and Grid Computing
31
MMMtCMtCnT ioiini
o 11
0 /max)(
Heterogeneous communication model: case study (ctd)
Many-to-one model for small messages
Heterogeneous and Grid Computing
32
Heterogeneous communication model: case study (ctd)
Many-to-one model for large messages
MMMtCMtCTn
iiii 32
1000 )/(
Heterogeneous and Grid Computing
33
Design of communication experiments Fixed processing delays ( unknowns)
experiments
Variable processing delays ( unknowns) Transmission rates ( unknowns)
experiments
(0) 2 2
(0) 2 2
(0) 2 2
ij i j
jk j k
ki k i
T C C
T C C
T C C
0
0i j
0
0j k
0
0k i
i j k
niC
2nC
it n2nCij
0
Mi j
i j k
0
Mk i
0
Mj k 2
nC
( ) 2 2ij i i j jij
MT M C Mt C Mt
Heterogeneous and Grid Computing
34
Design of communication experiments (ctd)
Additional experiments
Solution
0,
Mi j k
i j k 0
,M
k i j 0
,M
j i k ( ) 4 2 max(2 ,2 )i i i j j k k
ij ik
M MT M C Mt C Mt C Mt
33 nC
( ) 2 21 ij i ji j
ij
T M C Ct t
M
( ) ( ) 2, ( ) ( )
( ) ( ) 2, ( ) ( )
i ij iij ik
i
i ik iik ij
T M T M CT M T M
MtT M T M C
T M T MM
Heterogeneous and Grid Computing
35
Design of communication experiments (ctd)
Measurements– a small number of measurements– particular message sizes (0 and m<S)– fast roundtrips (one-to-one, one-to-two)
Calculations– comparisons– simple expressions to get values
Averaging– within solution (n fixed, n variable processing delays, and
transmission rates)– within measurements (for accurate measurement of the communication
execution time)
33 nC
2nC
Heterogeneous and Grid Computing
36
Optimization of application
A real-time satellite imaging application A sequence of raw data images divided into partitions for
parallel processing by a cluster
Number of nodes
Messag
e s
ize
M2
Mc
M1
n1 n2
Heterogeneous and Grid Computing
37
Redesigning application
Calculate the number of sub-partitions m of a partition of the medium size M so that:
Replace MPI_Gather with sequence of MPI_Gather for smaller messages
11 1 , Mm
MM
m
M
Heterogeneous and Grid Computing
38
Communication models (ctd)
Other heterogeneous platforms– Local network of computers– Global networks
» Dedicated communication channels
» Internet-connected
Currently used models– Simplified versions of cluster models
» p2p communication time is modeled by β ×m
Heterogeneous and Grid Computing
39
Communication models (ctd)
Wide-area links– Dedicated
» Serial links between remote computers» Numerous algorithms for optimization of
communication operations
– Internet» Allow for parallel simultaneous communications
between two remote computers without degradation of bandwidth
» New area of R&D