30
Basics Selected Examples of . . . Load Balancing with . . . Page 1 of 30 Parallel and High-Performance Computing 5. Dynamic Load Balancing Hans-Joachim Bungartz 5) Dynamic Load Balancing

5) Dynamic Load Balancing Basics Selected Examples of Load

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 1 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

5) Dynamic Load Balancing

Page 2: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 2 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

5.1. Basics

Notions• central topic in distributed environments (nets, clusters, loosely coupled parallel com-

puters): distribution of the computational load among computers or processors

• difficulty: load situation is hard to predict or changes permanently

– example adaptive mesh refinement with partial differential equations: new gridpoints are created and, thus, change the current load situation

– example I/O interaction: time needed is hard to estimate

– example searching: can be successful earlier or later

• unequal load reduces parallel efficiency

• therefore: load distribution or load balancingor scheduling

• one distinguishes

– global scheduling(where do which processes run?) and local scheduling(whena processor does deal with which processes?)

– static load balancing(a priori) and dynamic load balancing(during runtime)

• in the following: dynamic global scheduling

• important: no significant overhead of the measures taken (otherwise: bureaucracywins)

Page 3: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 3 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Important Aspects for the Design of a Strategy

• Which is the objective of the strategy?

– optimization of the system load (computing centre’s or system-oriented point ofview) or optimization of the runtime of applications (users’ application-orientedpoint of view; different for exclusive (dedicated) or shared use)?

– only placementof new processes, or also migration of running processes?

• On which level of integration load distribution is realized?

– tasks to be done: record load, select a strategy, apply a strategy, take thenecessary measures

– Who does the job – the application program (a parallel data base, for example),the runtime system (the runtime system of PVM, e.g.), or the operating system?

• Of which structure is the parallel application?

– Are there any restrictions concerning the mapping of processes to processors(frequently true in numerical simulations – there are location-based relationssuch as geometric neighbourhood)?

• Which units shall be placed or distributed?

– processes (coarse-grain) or threads (fine-grain), parts of programs, objects, ordata (simulations)?

Page 4: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 4 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Classification of Strategies(1)

• with respect to the system model:

– origin of the underlying idea (physics, graph theory, economics)

– original target topology (nets, bus topologies, ... )

– underlying data topology (grids, trees, sets, ...)

• with respect to distribution mechanisms:

– handing over of load only between neighbours or large-distance distributionalso?

– just placement of new processes or real migration?

• with respect to the information flow :

– To whom a processor does communicate his load situation?

– From where does a processor get load-related informations?

Page 5: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 5 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Classification of Strategies (2)

• with respect to coordination:

– central or decentral decision on actions to get started?

– How are decisions taken (autonomous, cooperative, or competitive)?

– Who are the participants of arrangements (neighbours, all)?

• with respect to the underlying algorithms:

– static or dynamic process of decision?

– Who takes the initiative (the idle node, the overloaded node, some master node,the clock)?

– fixed, adaptively adjustable, or even smart strategy?

– Do cost arguments play an important part?

– Are there any security mechanisms against excrescences (load distributiondominates runtime and ruins efficiency)?

Page 6: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 6 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Load Models

• For recording or estimating the load, we need reliable load models.

• Load models are based upon load indices (quantitative measures for the load ofproviders of computing time (processors)):

– simple and composite load indices (one or more characteristic quantities)

– can refer to different functional units (CPU, bus, memory)

– snapshot quantities (describe the situation at one point of time) or integrated oraveraged quantities

– weightings may be fixed a priori or dynamically adjustable

– frequent use of stochastic quantities to take into account external influences

• properties of a good index:

– precisely reflects the target quantity at present

– allows for accurate predictions concerning future

– smoothing behaviour (in order to compensate peaks)

– based upon some simple formula, easy to compute

• example: UNIX load average xload : provides the average number of processes inthe CPU queue

Page 7: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 7 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Principles of Migration

• Which units migrate?

– parts of programs, processes, threads, or data?

• How big are the migrating units?

• Are migrations executed in a delayed way (only at certain points of time, for example)or immediately after the necessity has been perceived?

• Are all units handed over together, or is there some lazy copying (passing of furtherunits on demand only)?

• Where can units migrate to?

– to neighbouring nodes only, within a certain range, or to arbitrary nodes?

• in heterogeneous networks: Can migration only take place between nodes of thesame type, or aren’t there any restrictions?

– important in case of a limited functionality of certain nodes

Page 8: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 8 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

5.2. Selected Examples of Load Distribution Strate-gies

Diffusion Model• analogy to diffusion processes in physics (salt in water, colour in water, ice-cube in a

drink)

• balancing of some (initially possibly heterogeneous) concentration

• grid-oriented, balancing only between a node i and its neighbours N(i)

• each pair of neighbours i, j records its local load difference and hands over a certainpercentage of this difference

l(t+1)i := l

(t)i +

Xj∈N(i)

αij

“l(t)j − l

(t)i

”, 1 ≤ i ≤ n, 0 < αij < 1

• the balancing can be

– Jacobi-like: differences are computed at the beginning of step number i, and alllocal migrations are realized according to these differences

– Gauß-Seidel-like: after each migration (also within the i-th balancing step), thelocal differences are computed again

• iterative method!

• For orthogonal d-dimensional grid structures, it can be shown that the choice αij =12d∀i, j is optimal concerning the speed of load balancing.

Page 9: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 9 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Example for the Diffusion Model

• two-dimensional 4× 4-grid (thus 16 processors)

• Jacobi-like diffusion

• hand over 25% of the load difference (round down, if necessary)

• consider the first two steps of the iteration

• average load is 10

• maximum deviation from the average is 22 at the beginning, then 7, then 6

Page 10: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 10 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Bidding

• analogy: mechanisms of price fixing in markets

• supply and demand regulate the load:

– The processor without enough load looks for additional work (sends the infor-mation that he has free capacities).

– The overloaded processor looks for support (sends the information that hewants to get rid of some work).

– The arriving answers are compared.

– If it is possible, a balancing is done; otherwise the processors communicateagain (extended range of recipients, other load or capacity packets, and so on).

• The analysis of the bidding model is quite complex.

Page 11: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 11 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Balanced Allocation

• objective: bidding variant which is easier to analyze

• principle:

– If there is some local overload, select (at random) r nodes.

– Hand over load to that node among the r with the smallest load.

• Both quality and costs increase with increasing r.

Page 12: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 12 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Broker System

• origin of the idea: brokers at the stock exchange

• designed and especially well-suited for hierarchical topologies (trees)

• principle:

– each processor has a broker (realized as a cooperating agent) with local (sub-tree) knowledge

– via an application server, tasks arrive at the local broker and are – dependingon the available budget – processed locally in the subtree or handed over tothe father node (recursion possible)

– on some level (at latest in the root), some price-based decision and allocationare done

– price has to be paid for using resources and for the broking itself (it is cheaperto stay in the subtree than to go to a remote broker)

• very flexible scheme for hierarchical or heterogeneous net topologies

Page 13: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 13 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Random Matching

• origin of the idea: graph theory

• principle:

– construct (by chance) a matching in the topology graph of the net

– topology graph: nodes are processors, edges are direct connections betweennodes

– matching: subset of the edges such that each node occurs at most once

– perfect load balancing along all edges of the matching

• iterative method, several steps are necessary

• matching must be found in parallel

– start with an empty set of edges in each node

– local selection (by chance) of one incident edge in each node

– coordination with neighbouring nodes, solution of conflicts

Page 14: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 14 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Precalculation of the Load

• all that has been said before is based upon local information and local actions

• often expensive (since, from a global point of view, balancing steps that are not reallyhelpful may occur)

• sometimes better:

– global determination of the load, at the beginning or at certain points of time,and global determination of a suitable load distribution

– migrations with less communication

• developed and used for hierarchical topologies, especially (load recording and loadbalancing from the son to the father and vice versa)

Page 15: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 15 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

5.3. Load Balancing with Space Filling Curves

Space Filling Curves

• an unconventional load balancing strategy, a bit more in detail

• origin of the idea: analysis and topology (“topological monsters”)

• nice example of a construct from pure mathematics that gets practical relevance onlydecades later

• definition of a space filling curve (SFC), for reasons of simplicity only in 2 D:

– curve: image of a continuous mapping of the unit interval [0, 1] onto the unitsquare [0, 1]2

– space filling: curve covers the whole unit square (mapping is surjective) and,hence, covers an area greater than zero(!)

f : [0, 1] =: I → Q := [0, 1]2 , f surjective and continuous

• prominent representatives:

– Hilbert’s curve : 1891, the most famous space filling curve

– Peano’s curve: 1890, oldest space filling curve

– Lebesgue’s curve: quadtree principle, the most important SFC for computer sci-ence

Page 16: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 16 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Hilbert’s SFC

• the construction follows the geometric conception: if I can be mapped onto Q in thespace filling sense, then each of the four congruent subintervals of I can be mappedto one of the four quadrants of Q in the space filling sense, too

• recursive application of this partitioning and allocation process preserving

– neighbourhood relations: neighbouring subintervals in I are mapped onto neigh-bouring subsquares of Q

– subset relations (inclusion): from I1 ⊆ I2 follows f(I1) ⊆ f(I2)

• limit case: Hilbert’s curve

– from the correspondance of nestings of intervals in I and nestings of squaresin Q, we get pairs of points in I and of corresponding image points in Q

– of course, the iterative steps in this generation process are of practical rele-vance, not the limit case (the SFC) itself

* start with a generator or Leitmotiv (defines the order in which the sub-squares are “visited”)

* apply generator in each subsquare (with appropriate similarity transforma-tions)

* connect the open ends

Page 17: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 17 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Generation Processes with Hilbert’s Generator• classical version of Hilbert:

• variant of Moore:

• modulo symmetry, these are the only two possibilities!

Page 18: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 18 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Some Remarks on the Injectivity

• all iterations, i.e. the longer and longer curves or their generating mappings from I,respectively, are injective, of course

• Hilbert’s curve itself, however, is not injective: there are image points with morethan one corresponding original point in I (look at the defining correlated nestingprocesses in I and Q)

• this is necessary, since:

– Cantor 1878: between two arbitrary-, but finite-dimensional smooth manifolds,there exists a bijective mapping (injective and surjective)

– Netto 1879: if the dimensionalities of two such manifolds are different, the cor-responding bijection can never be continuous (and, hence, defines no SFC)

Page 19: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 19 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Peano’s SFC• ancestor of all SFCs

• subdivision of I and Q into nine congruent subdomains

• definition of a leitmotiv, again, defines the order of visit

• now, there are 273 different (modulo symmetry) possibilites to recursively apply thegenerator preserving neighbourhood and inclusion

serpentine type (left and centre) and meander type (right)

Page 20: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 20 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Lebesgue’s SFC

• many applications in computer science

• compared with the SFCs studied so far, there are several differences:

– both Hilbert’s and Peano’s SFC can be nowhere differentiated, whereas Lebesgue’sSFC can be differentiated almost everywhere

– both Hilbert’s and Peano’s SFC are self-similar (if we apply the mapping to anarbitrary subinterval of I, the result is an SFC of the same type), Lebesgue’sSFC is not self-similar

• continuity and differentiability can be shown

• missing self-similarity: easy to prove and understand, see exercises!

Page 21: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 21 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Definition of Lebesgue’s SFC – the Cantor Set

• Cantor set: remove from I the central third, and go on removing the inner third fromthe remaining subintervals

• binary and ternary numbers:

– 03.x1x2x3... =P...

i=1 xi · 3−i, xi ∈ {0, 1, 2}– 02.x1x2x3... =

P...i=1 xi · 2−i, xi ∈ {0, 1}

• Cantor set is of Lebesgue measure zero and can be formally represented with thehelp of ternary numbers:

C := {03.(2t1)(2t2)(2t3)... ; ti ∈ {0, 1}}

• definition of the mapping on the Cantor set:

fl(03.(2t1)(2t2)(2t3)...) :=

„02.t1t3t5...02.t2t4t6...

«• definition of the mapping between the points of the Cantor set: linear interpolation

Page 22: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 22 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Generator of Lebesgue’s SFC• Lebesgue’s SFC has a generator, too:

• this is just the lexicographic or Morton ordering, well-known from quadtrees and oc-trees

Page 23: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 23 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

SFC – Applications in Computer Science• sequentialization of multi- (i.e. especially high-) dimensional data

• example: search indices in data bases

– high-dimensional data (entry = dimension), nevertheless especially one-dimensionalsearch indices (B-trees: 1 D primary index) and serial concatenation

– drawback of this 1 D proceeding is obvious: find all male Germans with size ofshoes 58 (about 50% of the German population has to be dealt with, further)

– complexity in case of a really high-dimensional index:Qd

i=1 pi with hit ratio pi

in dimension i

– ideal situation: locality-preserving (i.e. essentially continuous) sequentialization(multi-dimensionality is inherent, but 1 D index can be used)

• locality:

– data are stringed sequentially like pearls

– neighbouring points in the unit interval have neighbouring images in the unitsquare

– the other way round (more important): exceptions due to the missing injectivity(there may be several separated original regions of some subdomain in Q;however, originals are restricted to some clusters, in general)

• in the field of data bases (cf. UB trees) and of data mining widespread

Page 24: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 24 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

SFC – Applications in Numerical Simulation

• multi-particle or N -body problems:

– N bodies correlate via forces (gravitation, e.g.)

– examples: astrophysics, molecular dynamics

– N typically very large (107, 108 and more)

– models lead to a system of N ordinary differential equations with a potential astheir right-hand side (summarizing the influence of the N − 1 other bodies)

– global couplings, but influence decreases with increasing distance (this effectallows for simplifications)

– bodies may be spread over space in an irregular way, their positions maychange

• adaptive grids for partial differential equations:

– N grid points are spread over the domain of discretization, typically no regularstructure

– generally, only loose couplings and stationary positions

– adaptivity: new points are created during the computations, others may beremoved

• in both cases: dynamic load balancing is important and nontrivial

Page 25: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 25 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

SFC for Load Distribution

• idea:

1. to points in space, assign points on some iteration of an SFC

2. linear order of the respective SFC’s original points on I

3. simple partitioning (assign points to processors) based on this sequential order

• two techniques for the first step:

– change continuous coordinates (x, y) in Q into binary or quartary codes oflength k: „

xy

«7→

„02.x1x2x3...xk

02.y1y2y3...yk

«7→ 04.w1w2w3...wk

provides quadtree leaf addresses of depth k or an ordering on the k-th iterationof Lebesgue’s SFC

– again, start from the binary representation of length k and determine recursivelythe position on the k-th iteration of Hilbert’s SFC

Page 26: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 26 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

SFC for Load Distribution (cont’d)

• in both cases: keys already provide positions on I

• now to be done: (parallel) sorting of the keys (main computational task of the algo-rithm)

– may be costly at the beginning

– however, later: already almost sorted inputs (only small motions, only a fewnew grid points created per iterative step of the PDE solver)

• finally: update the partitioning (weighted or unweighted), step of migration

Page 27: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 27 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Quality Considerations

• locality:

– continuity guarantees that originals that are close together will be mapped toclose image points (for self-similar SFC, we have even some Lipschitz-continuity)

– more important would be a “continuity of the inverse mapping”; but of course,this is not possible due to the missing injectivity

– numerous theoretical considerations exist

– at least: originals are clustered again (a few clusters only)

– Hilbert’s SFC has the best properties

• load distribution:

– excellent parallelization properties, almost perfect balancing

– communication costs are comparably small

– good efficieny of load distribution already for small problem sizes

Page 28: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 28 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Costs

• communication:

– more complicated (and, hence, longer) subdomain boundaries due to the par-titioning than we get with successive coordinate bisection in kd-trees (halve theload by a cut in y-direction, then halve load in both parts by a cut in x-direction,and so on)

– communication takes place along subdomain boundaries

– overall, a slightly higher communication than with bisection

• load distribution:

– even for large numbers p of processors small costs (just one sorting per step,in contrast to the logp sorting operations with coordinate bisection)

Page 29: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 29 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

Relations to Fractals• notion of self-similarity shows close relations to fractals, whose definitions are also

based upon recursively applied similarity transformations

• two examples: Koch’s snow-flake (left) and Sierpinski’s triangle (right)

• both have a non-integer fractal dimension:

dk ≈ 1.2619, ds ≈ 1.580

• SFC, in contrast to that, are areas or volumes, and have, hence, an integer fractaldimension!

Page 30: 5) Dynamic Load Balancing Basics Selected Examples of Load

Basics

Selected Examples of . . .

Load Balancing with . . .

Page 30 of 30

Parallel and High-PerformanceComputing

5. Dynamic Load BalancingHans-Joachim Bungartz

The Fractal Dimension

• start from the generator and the similarity transformations applied to it

• parameters:

– n: number of smaller versions of the generator in the next step of iteration

– r: 0 < r < 1, scaling factor by which the generator is reduced in each step

• definition:

d :=log(n)

log(r−1)

• for “real” curves and surfaces: the same as the conventional (topological) dimension

• examples:

– Koch’s snow-flake: n = 4, r = 13

– Sierpinski’s triangle: n = 3, r = 12

– Hilbert’s curve: n = 4, r = 12