Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 1 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
5) Dynamic Load Balancing
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 2 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
5.1. Basics
Notions• central topic in distributed environments (nets, clusters, loosely coupled parallel com-
puters): distribution of the computational load among computers or processors
• difficulty: load situation is hard to predict or changes permanently
– example adaptive mesh refinement with partial differential equations: new gridpoints are created and, thus, change the current load situation
– example I/O interaction: time needed is hard to estimate
– example searching: can be successful earlier or later
• unequal load reduces parallel efficiency
• therefore: load distribution or load balancingor scheduling
• one distinguishes
– global scheduling(where do which processes run?) and local scheduling(whena processor does deal with which processes?)
– static load balancing(a priori) and dynamic load balancing(during runtime)
• in the following: dynamic global scheduling
• important: no significant overhead of the measures taken (otherwise: bureaucracywins)
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 3 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Important Aspects for the Design of a Strategy
• Which is the objective of the strategy?
– optimization of the system load (computing centre’s or system-oriented point ofview) or optimization of the runtime of applications (users’ application-orientedpoint of view; different for exclusive (dedicated) or shared use)?
– only placementof new processes, or also migration of running processes?
• On which level of integration load distribution is realized?
– tasks to be done: record load, select a strategy, apply a strategy, take thenecessary measures
– Who does the job – the application program (a parallel data base, for example),the runtime system (the runtime system of PVM, e.g.), or the operating system?
• Of which structure is the parallel application?
– Are there any restrictions concerning the mapping of processes to processors(frequently true in numerical simulations – there are location-based relationssuch as geometric neighbourhood)?
• Which units shall be placed or distributed?
– processes (coarse-grain) or threads (fine-grain), parts of programs, objects, ordata (simulations)?
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 4 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Classification of Strategies(1)
• with respect to the system model:
– origin of the underlying idea (physics, graph theory, economics)
– original target topology (nets, bus topologies, ... )
– underlying data topology (grids, trees, sets, ...)
• with respect to distribution mechanisms:
– handing over of load only between neighbours or large-distance distributionalso?
– just placement of new processes or real migration?
• with respect to the information flow :
– To whom a processor does communicate his load situation?
– From where does a processor get load-related informations?
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 5 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Classification of Strategies (2)
• with respect to coordination:
– central or decentral decision on actions to get started?
– How are decisions taken (autonomous, cooperative, or competitive)?
– Who are the participants of arrangements (neighbours, all)?
• with respect to the underlying algorithms:
– static or dynamic process of decision?
– Who takes the initiative (the idle node, the overloaded node, some master node,the clock)?
– fixed, adaptively adjustable, or even smart strategy?
– Do cost arguments play an important part?
– Are there any security mechanisms against excrescences (load distributiondominates runtime and ruins efficiency)?
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 6 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Load Models
• For recording or estimating the load, we need reliable load models.
• Load models are based upon load indices (quantitative measures for the load ofproviders of computing time (processors)):
– simple and composite load indices (one or more characteristic quantities)
– can refer to different functional units (CPU, bus, memory)
– snapshot quantities (describe the situation at one point of time) or integrated oraveraged quantities
– weightings may be fixed a priori or dynamically adjustable
– frequent use of stochastic quantities to take into account external influences
• properties of a good index:
– precisely reflects the target quantity at present
– allows for accurate predictions concerning future
– smoothing behaviour (in order to compensate peaks)
– based upon some simple formula, easy to compute
• example: UNIX load average xload : provides the average number of processes inthe CPU queue
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 7 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Principles of Migration
• Which units migrate?
– parts of programs, processes, threads, or data?
• How big are the migrating units?
• Are migrations executed in a delayed way (only at certain points of time, for example)or immediately after the necessity has been perceived?
• Are all units handed over together, or is there some lazy copying (passing of furtherunits on demand only)?
• Where can units migrate to?
– to neighbouring nodes only, within a certain range, or to arbitrary nodes?
• in heterogeneous networks: Can migration only take place between nodes of thesame type, or aren’t there any restrictions?
– important in case of a limited functionality of certain nodes
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 8 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
5.2. Selected Examples of Load Distribution Strate-gies
Diffusion Model• analogy to diffusion processes in physics (salt in water, colour in water, ice-cube in a
drink)
• balancing of some (initially possibly heterogeneous) concentration
• grid-oriented, balancing only between a node i and its neighbours N(i)
• each pair of neighbours i, j records its local load difference and hands over a certainpercentage of this difference
l(t+1)i := l
(t)i +
Xj∈N(i)
αij
“l(t)j − l
(t)i
”, 1 ≤ i ≤ n, 0 < αij < 1
• the balancing can be
– Jacobi-like: differences are computed at the beginning of step number i, and alllocal migrations are realized according to these differences
– Gauß-Seidel-like: after each migration (also within the i-th balancing step), thelocal differences are computed again
• iterative method!
• For orthogonal d-dimensional grid structures, it can be shown that the choice αij =12d∀i, j is optimal concerning the speed of load balancing.
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 9 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Example for the Diffusion Model
• two-dimensional 4× 4-grid (thus 16 processors)
• Jacobi-like diffusion
• hand over 25% of the load difference (round down, if necessary)
• consider the first two steps of the iteration
• average load is 10
• maximum deviation from the average is 22 at the beginning, then 7, then 6
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 10 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Bidding
• analogy: mechanisms of price fixing in markets
• supply and demand regulate the load:
– The processor without enough load looks for additional work (sends the infor-mation that he has free capacities).
– The overloaded processor looks for support (sends the information that hewants to get rid of some work).
– The arriving answers are compared.
– If it is possible, a balancing is done; otherwise the processors communicateagain (extended range of recipients, other load or capacity packets, and so on).
• The analysis of the bidding model is quite complex.
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 11 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Balanced Allocation
• objective: bidding variant which is easier to analyze
• principle:
– If there is some local overload, select (at random) r nodes.
– Hand over load to that node among the r with the smallest load.
• Both quality and costs increase with increasing r.
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 12 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Broker System
• origin of the idea: brokers at the stock exchange
• designed and especially well-suited for hierarchical topologies (trees)
• principle:
– each processor has a broker (realized as a cooperating agent) with local (sub-tree) knowledge
– via an application server, tasks arrive at the local broker and are – dependingon the available budget – processed locally in the subtree or handed over tothe father node (recursion possible)
– on some level (at latest in the root), some price-based decision and allocationare done
– price has to be paid for using resources and for the broking itself (it is cheaperto stay in the subtree than to go to a remote broker)
• very flexible scheme for hierarchical or heterogeneous net topologies
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 13 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Random Matching
• origin of the idea: graph theory
• principle:
– construct (by chance) a matching in the topology graph of the net
– topology graph: nodes are processors, edges are direct connections betweennodes
– matching: subset of the edges such that each node occurs at most once
– perfect load balancing along all edges of the matching
• iterative method, several steps are necessary
• matching must be found in parallel
– start with an empty set of edges in each node
– local selection (by chance) of one incident edge in each node
– coordination with neighbouring nodes, solution of conflicts
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 14 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Precalculation of the Load
• all that has been said before is based upon local information and local actions
• often expensive (since, from a global point of view, balancing steps that are not reallyhelpful may occur)
• sometimes better:
– global determination of the load, at the beginning or at certain points of time,and global determination of a suitable load distribution
– migrations with less communication
• developed and used for hierarchical topologies, especially (load recording and loadbalancing from the son to the father and vice versa)
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 15 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
5.3. Load Balancing with Space Filling Curves
Space Filling Curves
• an unconventional load balancing strategy, a bit more in detail
• origin of the idea: analysis and topology (“topological monsters”)
• nice example of a construct from pure mathematics that gets practical relevance onlydecades later
• definition of a space filling curve (SFC), for reasons of simplicity only in 2 D:
– curve: image of a continuous mapping of the unit interval [0, 1] onto the unitsquare [0, 1]2
– space filling: curve covers the whole unit square (mapping is surjective) and,hence, covers an area greater than zero(!)
f : [0, 1] =: I → Q := [0, 1]2 , f surjective and continuous
• prominent representatives:
– Hilbert’s curve : 1891, the most famous space filling curve
– Peano’s curve: 1890, oldest space filling curve
– Lebesgue’s curve: quadtree principle, the most important SFC for computer sci-ence
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 16 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Hilbert’s SFC
• the construction follows the geometric conception: if I can be mapped onto Q in thespace filling sense, then each of the four congruent subintervals of I can be mappedto one of the four quadrants of Q in the space filling sense, too
• recursive application of this partitioning and allocation process preserving
– neighbourhood relations: neighbouring subintervals in I are mapped onto neigh-bouring subsquares of Q
– subset relations (inclusion): from I1 ⊆ I2 follows f(I1) ⊆ f(I2)
• limit case: Hilbert’s curve
– from the correspondance of nestings of intervals in I and nestings of squaresin Q, we get pairs of points in I and of corresponding image points in Q
– of course, the iterative steps in this generation process are of practical rele-vance, not the limit case (the SFC) itself
* start with a generator or Leitmotiv (defines the order in which the sub-squares are “visited”)
* apply generator in each subsquare (with appropriate similarity transforma-tions)
* connect the open ends
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 17 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Generation Processes with Hilbert’s Generator• classical version of Hilbert:
• variant of Moore:
• modulo symmetry, these are the only two possibilities!
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 18 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Some Remarks on the Injectivity
• all iterations, i.e. the longer and longer curves or their generating mappings from I,respectively, are injective, of course
• Hilbert’s curve itself, however, is not injective: there are image points with morethan one corresponding original point in I (look at the defining correlated nestingprocesses in I and Q)
• this is necessary, since:
– Cantor 1878: between two arbitrary-, but finite-dimensional smooth manifolds,there exists a bijective mapping (injective and surjective)
– Netto 1879: if the dimensionalities of two such manifolds are different, the cor-responding bijection can never be continuous (and, hence, defines no SFC)
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 19 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Peano’s SFC• ancestor of all SFCs
• subdivision of I and Q into nine congruent subdomains
• definition of a leitmotiv, again, defines the order of visit
• now, there are 273 different (modulo symmetry) possibilites to recursively apply thegenerator preserving neighbourhood and inclusion
serpentine type (left and centre) and meander type (right)
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 20 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Lebesgue’s SFC
• many applications in computer science
• compared with the SFCs studied so far, there are several differences:
– both Hilbert’s and Peano’s SFC can be nowhere differentiated, whereas Lebesgue’sSFC can be differentiated almost everywhere
– both Hilbert’s and Peano’s SFC are self-similar (if we apply the mapping to anarbitrary subinterval of I, the result is an SFC of the same type), Lebesgue’sSFC is not self-similar
• continuity and differentiability can be shown
• missing self-similarity: easy to prove and understand, see exercises!
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 21 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Definition of Lebesgue’s SFC – the Cantor Set
• Cantor set: remove from I the central third, and go on removing the inner third fromthe remaining subintervals
• binary and ternary numbers:
– 03.x1x2x3... =P...
i=1 xi · 3−i, xi ∈ {0, 1, 2}– 02.x1x2x3... =
P...i=1 xi · 2−i, xi ∈ {0, 1}
• Cantor set is of Lebesgue measure zero and can be formally represented with thehelp of ternary numbers:
C := {03.(2t1)(2t2)(2t3)... ; ti ∈ {0, 1}}
• definition of the mapping on the Cantor set:
fl(03.(2t1)(2t2)(2t3)...) :=
„02.t1t3t5...02.t2t4t6...
«• definition of the mapping between the points of the Cantor set: linear interpolation
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 22 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Generator of Lebesgue’s SFC• Lebesgue’s SFC has a generator, too:
• this is just the lexicographic or Morton ordering, well-known from quadtrees and oc-trees
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 23 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
SFC – Applications in Computer Science• sequentialization of multi- (i.e. especially high-) dimensional data
• example: search indices in data bases
– high-dimensional data (entry = dimension), nevertheless especially one-dimensionalsearch indices (B-trees: 1 D primary index) and serial concatenation
– drawback of this 1 D proceeding is obvious: find all male Germans with size ofshoes 58 (about 50% of the German population has to be dealt with, further)
– complexity in case of a really high-dimensional index:Qd
i=1 pi with hit ratio pi
in dimension i
– ideal situation: locality-preserving (i.e. essentially continuous) sequentialization(multi-dimensionality is inherent, but 1 D index can be used)
• locality:
– data are stringed sequentially like pearls
– neighbouring points in the unit interval have neighbouring images in the unitsquare
– the other way round (more important): exceptions due to the missing injectivity(there may be several separated original regions of some subdomain in Q;however, originals are restricted to some clusters, in general)
• in the field of data bases (cf. UB trees) and of data mining widespread
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 24 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
SFC – Applications in Numerical Simulation
• multi-particle or N -body problems:
– N bodies correlate via forces (gravitation, e.g.)
– examples: astrophysics, molecular dynamics
– N typically very large (107, 108 and more)
– models lead to a system of N ordinary differential equations with a potential astheir right-hand side (summarizing the influence of the N − 1 other bodies)
– global couplings, but influence decreases with increasing distance (this effectallows for simplifications)
– bodies may be spread over space in an irregular way, their positions maychange
• adaptive grids for partial differential equations:
– N grid points are spread over the domain of discretization, typically no regularstructure
– generally, only loose couplings and stationary positions
– adaptivity: new points are created during the computations, others may beremoved
• in both cases: dynamic load balancing is important and nontrivial
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 25 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
SFC for Load Distribution
• idea:
1. to points in space, assign points on some iteration of an SFC
2. linear order of the respective SFC’s original points on I
3. simple partitioning (assign points to processors) based on this sequential order
• two techniques for the first step:
– change continuous coordinates (x, y) in Q into binary or quartary codes oflength k: „
xy
«7→
„02.x1x2x3...xk
02.y1y2y3...yk
«7→ 04.w1w2w3...wk
provides quadtree leaf addresses of depth k or an ordering on the k-th iterationof Lebesgue’s SFC
– again, start from the binary representation of length k and determine recursivelythe position on the k-th iteration of Hilbert’s SFC
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 26 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
SFC for Load Distribution (cont’d)
• in both cases: keys already provide positions on I
• now to be done: (parallel) sorting of the keys (main computational task of the algo-rithm)
– may be costly at the beginning
– however, later: already almost sorted inputs (only small motions, only a fewnew grid points created per iterative step of the PDE solver)
• finally: update the partitioning (weighted or unweighted), step of migration
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 27 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Quality Considerations
• locality:
– continuity guarantees that originals that are close together will be mapped toclose image points (for self-similar SFC, we have even some Lipschitz-continuity)
– more important would be a “continuity of the inverse mapping”; but of course,this is not possible due to the missing injectivity
– numerous theoretical considerations exist
– at least: originals are clustered again (a few clusters only)
– Hilbert’s SFC has the best properties
• load distribution:
– excellent parallelization properties, almost perfect balancing
– communication costs are comparably small
– good efficieny of load distribution already for small problem sizes
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 28 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Costs
• communication:
– more complicated (and, hence, longer) subdomain boundaries due to the par-titioning than we get with successive coordinate bisection in kd-trees (halve theload by a cut in y-direction, then halve load in both parts by a cut in x-direction,and so on)
– communication takes place along subdomain boundaries
– overall, a slightly higher communication than with bisection
• load distribution:
– even for large numbers p of processors small costs (just one sorting per step,in contrast to the logp sorting operations with coordinate bisection)
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 29 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
Relations to Fractals• notion of self-similarity shows close relations to fractals, whose definitions are also
based upon recursively applied similarity transformations
• two examples: Koch’s snow-flake (left) and Sierpinski’s triangle (right)
• both have a non-integer fractal dimension:
dk ≈ 1.2619, ds ≈ 1.580
• SFC, in contrast to that, are areas or volumes, and have, hence, an integer fractaldimension!
Basics
Selected Examples of . . .
Load Balancing with . . .
Page 30 of 30
Parallel and High-PerformanceComputing
5. Dynamic Load BalancingHans-Joachim Bungartz
The Fractal Dimension
• start from the generator and the similarity transformations applied to it
• parameters:
– n: number of smaller versions of the generator in the next step of iteration
– r: 0 < r < 1, scaling factor by which the generator is reduced in each step
• definition:
d :=log(n)
log(r−1)
• for “real” curves and surfaces: the same as the conventional (topological) dimension
• examples:
– Koch’s snow-flake: n = 4, r = 13
– Sierpinski’s triangle: n = 3, r = 12
– Hilbert’s curve: n = 4, r = 12