Published on

06-Jun-2016View

223Download

11

Transcript

<ul><li><p>Compressing Cube-Connected Cycles andButterfly Networks</p><p>Ralf Klasing,1 Reinhard Luling,2 Burkhard Monien2</p><p>1 Department of Computer Science, University of Warwick, Coventry CV4 7AL, England</p><p>2 Department of Mathematics and Computer Science, University of Paderborn,33095 Paderborn, Germany</p><p>Received 10 April 1995; accepted 16 June 1997</p><p>Abstract: We consider the simulation of large cube-connected cycles (CCC ) and large butterfly net-works (BFN ) on smaller ones, a problem that arises when algorithms designed for an architecture of anideal size are to be executed on an existing architecture of a fixed size. We show that large CCCs andBFNs can be embedded into smaller networks of the same type with (a) dilation 2 and optimum load,(b) dilation 1 and optimum load in most cases, and (c) dilation 1 and nearly optimum load in all cases.Our results show that large CCCs and BFNs can be simulated very efficiently on smaller ones. Additionally,we implemented our algorithm for compressing CCCs and ran several experiments on a Transputernetwork, which showed that our technique also behaves very well from a practical point of view. q 1998John Wiley & Sons, Inc. Networks 32: 4765, 1998</p><p>1. INTRODUCTION 14]) . But the problem generally neglected is that mostof the existing algorithms are designed for arbitrarily large</p><p>Over the past few years, much research has been done in networks (see, e.g., [18, 21, 22]) , whereas, in practice,the field of interconnection networks for parallel computer the processor network will be fixed and of smaller size.architectures (for a survey, cf. [12, 15, 20]) , as most of Thus, the larger network must be simulated in an efficientthese architectures can actually be realized in hardware way (i.e., needing little simulation time) on the smaller(e.g., as a network of Transputers) . Much of the work target network.has been focused on the capability of certain networks to Solutions to this problem, which is commonly modeledsimulate other network or algorithm structures, in order as a graph embedding problem, have been proposed soto execute parallel algorithms of a special structure effi- far for common network structures like hypercubes, bi-ciently on different processor networks (see, e.g., [3, nary trees, meshes, shuffle-exchange networks, and de-</p><p>Bruijn networks in [1, 2, 47, 9, 10, 16, 17]. So far, onlypartial results are known about two classes of networksCorrespondence to: R. Klasing; e-mail: rakedcs.warwick.ac.uk</p><p>An extended abstract of this paper was presented at the 2nd IEEE which are very important for practical purposes, namely,Symposium on Parallel and Distributed Processing (1990). the cube-connected cycles (CCC) as introduced in</p><p>Contract grant sponsor: German Research Association (DFG); con- [18] and the butterfly network (BFN) .tract grant numbers: Mo 285/9-1, Me 872/6-1</p><p>In [4, 9, 17], embeddings with optimum dilation andContract grant sponsor: ESPRIT Basic Research Action; contractgrant numbers: 7141 (ALCOM II) and No. 20244 (ALCOM IT). load are presented in the case of embedding CCCs and</p><p>q 1998 John Wiley & Sons, Inc. CCC 0028-3045/98/010047-19</p><p>47</p><p>8U22 826/ 8U22$$0826 06-18-98 08:55:16 netwa W: Networks</p></li><li><p>48 KLASING, LULING, AND MONIEN</p><p>BFNs of dimension l into k where kl . The authors also This is optimal if l /k {(2p 0 2)/pp {6, 7, rrr}}and very close to optimal in all other cases.restrict themselves to special kinds of embeddings of a</p><p>very regular structure, like coverings [4] , homogeneous The general strategy of the embeddings is to map 2 l0kcycles in CCC( l) /BFN( l) of length l onto one cycle inemulations [9] , and homomorphisms [17]. Because of</p><p>the very restricted nature, Bodlaender [4] and Peine [17] CCC(k) /BFN(k) of length k and to distribute the nodesof the guest cycles as evenly as possible on the host cycle.were also able to classify their embeddings completely.</p><p>In [2] , a general procedure was described for mapping A specification or a variation of this general idea willyield many of the results above. But in one importantparallel algorithms into parallel architectures. This proce-</p><p>dure was applied to the CCC network achieving dilation case, namely, the dilation 1 embedding of BFN( l) intoBFN(k) for l /k 2, this construction is not powerful1, but a very high load. Also, only special kinds of embed-</p><p>dings, so-called contractions, are considered. enough. (It only yields load 2r2 l0k .) In this case, weintroduce a method that allows local rearrangement ofThis paper investigates the embedding problem for</p><p>CCC and BFN taking into account general embedding nodes between different host cycles. As an effect, theload is distributed more evenly in the corresponding partfunctions and any possible network dimension. The cen-</p><p>tral statement derived is of the host network.Our results have a major impact on many fields in</p><p>Large CCCs and BFNs can be simulated very efficiently parallel processing, as CCCs and BFNs have been gener-(almost optimally) on smaller ones. ally accepted as two benchmark architectures for multi-</p><p>computers because of their fixed degree and good routingIn more detail, we prove for the cube-connected cycles capabilities. To show the practical applicability of ournetwork that CCC( l) can be embedded into CCC(k) , l techniques, we have built a tool which allows mapping k , with of any CCC of dimension l to a fixed CCC of dimension</p><p>k l . We present results for a distributed branch and(a) Dilation 2 and optimum load ( l /k)2 l0k . bound algorithm solving the vertex cover problem [13](b) Dilation 1 and optimum load, if l /k 2. and for a program which simulates an arbitrary distributed</p><p>algorithm. Using our mapping tool, many important algo-(c) Dilation 1 and optimum load, for certain values ofrithms for large CCCs and BFNs can be implementedl , k , if l /k 2.very efficiently on a network of realistic size, for example,(d) Dilation 1 and nearly optimum load, for all otherthe simulation of a parallel random access machinevalues of l , k , if l /k 2. (PRAM) on large BFNs as described in [19] can noweasily be transferred to a given network of processorsMore precisely, the load in cases (c) and (d) isconfigured as a butterfly of a fixed size.</p><p>2p 0 1p</p><p>2 l0k</p><p>2. DEFINITIONSfor p {2, 3, rrr} such that2p 0 3p 0 1 </p><p>lk 2p 0 1</p><p>p. (Most of the terminology is taken from [12, 15, 18,</p><p>20] .) Let, for any graph G (V, E ) , V (G ) V denotethe set of vertices of G , and E (G ) E denote the setThis is optimal if l /k {(2p 0 1)/pp {2, 3, rrr}}of edges of G . Let aV denote the binary complement ofand very close to optimal in all other cases.a {0, 1} .For the butterfly network, we show that BFN( l) can</p><p>be embedded into BFN(k) , l k , with (a) (d) as above.Here, the load in cases (c) and (d) is specified by</p><p>Networks</p><p>The (wrapped) cube-connected cycles network of dimen-2p 0 2p</p><p>2 l0ksion m , denoted by CCC(m) , has vertex-set Vm {0, 1,. . . , m 0 1} 1 {0, 1}m , where {0, 1}m denotes the setfor p {7, 8, rrr} such thatof length-m binary strings. For each vertex ( i , a) Vm , i {0, 1, . . . , m 0 1}, a {0, 1}m , we call i2p 0 4</p><p>p 0 1 lk 2p 0 2</p><p>p,</p><p>the level and a the position-within-level (PWL) string of. The edges of CCC(m) are of two types: For each i {0, 1, . . . , m 0 1} and each a a0a1rrram01 {0,5</p><p>32 l0k for l</p><p>k 5</p><p>3.</p><p>1}m , the vertex ( i , a) on level i of CCC(m) is connected</p><p>8U22 826/ 8U22$$0826 06-18-98 08:55:16 netwa W: Networks</p></li><li><p>CUBE-CONNECTED CYCLES AND BUTTERFLY NETWORK 49</p><p>Fig. 1. The cube-connected cycles CCC (3 ) .</p><p> By a cycle-edge with vertex (( i / 1) mod m , a) on Lex( i , a0a1rrran01)level ( i / 1) mod m and i2 n / a02 n01 / a12 n02 / rrr / an0120 .</p><p> By a cross-edge with vertex ( i , a( i)) on level i .Then, the lexicographical order on {0, 1, . . . , m 0 1}</p><p>Here, a( i) a0rrrai01aV iai/1rrram01 . For each a {0, 1 {0, 1} n is specified by1}m , the cycle</p><p>( i , a) ( j , b) B Lex( i , a) Lex( j , b) ,(0, a) 0 (1, a) 0 rrr 0 (m 0 1, a) 0 (0, a)</p><p>and the lexicographical distance between ( i , a) and ( j ,b) is defined asof length m will be denoted by Ca(m) or Ca .</p><p>CCC(m) has m2m nodes, 3m2m01 edges, and degree3. An illustration of CCC(3) is shown in Figure 1. Lex( i , a) 0 Lex( j , b).</p><p>The (wrapped) butterfly network of dimension m , de-noted by BFN(m) , has vertex-set Vm {0, 1, . . . , m Even Distributions0 1} 1 {0, 1}m , where {0, 1}m denotes the set of length-m binary strings. For each vertex ( i , a) Vm , i Let a1 , b1 , a2 , b2 N0 such that b1 a1 , b2 a2 , b1 {0, 1, . . . , m 0 1}, a {0, 1}m , we call i the level 0 a1 b2 0 a2 . Let r N. A functionand a the position-within-level (PWL) string of . Theedges of BFN(m) are of two types: For each i {0, 1, d : {a1 , a1 / 1, . . . , b1} 1 {0, 1} r r. . . , m 0 1} and each a a0a1rrram01 {0, 1}m , the {a2 , a2 / 1, . . . , b2}vertex ( i , a) on level i of BFN(m) is connected</p><p>is called an even distribution of {a1 , . . . , b1} 1 {0, By a cycle-edge with vertex (( i / 1) mod m , a) and 1} r among the nodes of {a2 , . . . , b2} according to the By a cross-edge with vertex (( i / 1) mod m , a( i)) lexicographical order on {a1 , . . . , b1} 1 {0, 1} r if d</p><p>satisfies the following properties:on level (i / 1) mod m . Again, a(i) a0rrrai01aV iai/1rrram01 . For each a {0, 1}m , the cycle d(a1 , 0 r) a2 , d(b1 , 1 r) b2 ,</p><p> d( i , b) d( i *, b *) , if ( i , b) ( i *, b *) according to(0, a) 0 (1, a) 0 rrr 0 (m 0 1, a) 0 (0, a) the lexicographical order on {a1 , . . . , b1} 1 {0, 1} r ,</p><p> [(b1 0 a1 / 1)/(b2 0 a2 / 1)]r2 r 0 1 d01( j)of length m will be denoted by Ca(m) or Ca . [(b1 0 a1 / 1)/(b2 0 a2 / 1)]r2 r for all j {a2 ,</p><p>BFN(m) has m2m nodes, m2m/1 edges, and degree 4.. . . , b2}.</p><p>An illustration of BFN(3) is shown in Figure 2. To obtaina clearer picture, level 0 has been replicated. [Note that such a distribution function d can always be</p><p>constructed for the parameters a1 , b1 , a2 , b2 , r as above.]Lexicographical Orderings</p><p>Network SimulationsFor many of the proofs later on, we will need the notionof lexicographical ordering. For this purpose, let the lexi- Let G and H be finite undirected graphs. An embedding</p><p>of G into H is a mapping f from the nodes of G to thecographical numbering Lex : {0, 1, . . . , m 0 1} 1 {0,1} n r N be defined as nodes of H . G is called the guest graph and H is called</p><p>8U22 826/ 8U22$$0826 06-18-98 08:55:16 netwa W: Networks</p></li><li><p>50 KLASING, LULING, AND MONIEN</p><p>Fig. 2. The butterfly graph BFN (3 ) .</p><p>the host graph of the embedding f . The dilation of the The exact distribution of the nodes of {Ca0a1rrral01ak ,embedding f is the maximum distance in the host between ak/1 , . . . , al01 {0, 1}} on Ca0a1rrrak01 is determined by athe images of adjacent guest nodes. Its load factor is the distribution functionmaximum number of vertices of the guest graph G thatare mapped to the same host graph vertex. [The optimum d : {k , k / 1, . . . , l 0 1} 1 {0, 1} l0k rload achievable is the ratio V (G)/V ( H) of thenumber of nodes in G and H .] Its edge congestion is the {0, 1, . . . , k 0 1}maximum number of edges that are routed through a sin-gle edge of H . [A routing is a mapping r of Gs edges which specifies, for each node number {k, k / 1, . . . ,to paths in H , r(1 , 2) a path from f (1) to f (2) l0 1} on the guest cycle Ca0a1rrral01 and each cycle indexin H .] akak/1rrral01, the position on the host cycle Ca0a1rrrak01 .An embedding of G into H is an abstraction of a simu- (On each host cycle Ca0a1rrrak01 , a0, a1 , . . . , ak01 {0, 1},lation of G by H as an interconnection network. The</p><p>the same distribution function is used.) Formally, thedilation and edge congestion are measures for the com-embedding f : V (CCC(l))/V ( BFN(l)) r V (CCC(k))/munication time, the load for the maximum work to beV ( BFN(k)) is of the formdone by a processor. In this paper, we focus on dilation</p><p>and load. Edge congestion will only play a minor role. f ( i , a0a1rrral01)</p><p>3. THEORETICAL RESULTS AND PROOFS</p><p>: </p><p>( i , a0rrrak01)if 0 i k 0 1,</p><p>(d( i , akak/1rrral01) ,a0rrrak01) else.</p><p>3.1. General Embedding Strategy</p><p>The basic idea of most of the embeddings presented hereis to map 2 l0k cycles Ca1 , Ca2 , . . . , Ca2 l0k in CCC( l) /BFN( l) of length l onto one cycle Cb in CCC(k) /BFN(k) The load of f is determined by the distribution functionof length k and to distribute the lr2 l0k nodes of Ca1 , . . . , d . Therefore, d should distribute the guest nodes as evenly</p><p>as possible on each host cycle. All the cross-edgesCa2 l0k appropriately among the k nodes of Cb . Two differ-ent kinds of such embeddings are distinguished:</p><p>( i , a) 0 ( i , a( i)) , 0 i k 0 1,First Construction ( i , a) 0 ( i / 1, a( i)) , 0 i k 0 2Let a0 , a1 , . . . , ak01 {0, 1}. The cycles</p><p>of CCC( l) /BFN( l) are mapped onto a corresponding{Ca0a1rrral01ak , ak/1 , . . . , al01 {0, 1}} of CCC( l) /cross-edge in CCC(k) /BFN(k) . Likewise, all the cycle-BFN( l) are mapped onto the cycle Ca0a1rrrak01 inedgesCCC(k) /BFN(k) as follows: For each 0 i k 0 1,</p><p>the node i of each Ca0a1rrral01 , ak , ak/1 , . . . , al01 {0, ( i , a) 0 ( i / 1, a) , 0 i k 0 21} is mapped onto the node i of Ca0a1rrrak01 . The nodesk , k / 1, . . . , l 0 1 of each Ca0a1rrral01 are distributed of CCC( l) /BFN( l) are mapped onto a corresponding cy-appropriately among the nodes of Ca0a1rrrak01 . cle-edge in CCC(k) /BFN(k) . All the other edges of</p><p>8U22 826/ 8U22$$0826 06-18-98 08:55:16 netwa W: Networks</p></li><li><p>CUBE-CONNECTED CYCLES AND BUTTERFLY NETWORK 51</p><p>CCC( l) /BFN( l) are mapped onto a path on a single cycle ( i , a)0 ( i , a( i)) , i {p(0) , p(1) , . . . , p(k0 1)},Cb in CCC(k) /BFN(k) . So, in this case, the dilation isdirectly dependent on the distribution of the guest nodes ( i , a) 0 (( i / 1) mod l , a( i)) ,on the host cycle and stands partly in contrast to the</p><p>i {p(0) , p(1) , . . . , p(k 0 1)}desired evenness of the distribution as explained above.For low dilation, the nodes ( i , a0a1rrral01) and ( j ,</p><p>of CCC( l) /BFN( l) are mapped onto a path consisting ofb0b1rrrbl01) of the cycles Ca1 , Ca2 , . . . , Ca2 l0k of one corresponding cross-edge in CCC(k) /BFN(k) andCCC( l) /BFN( l) with a small lexicographical distance two (possibly empty) paths on two different cyclesbetween ( i , akak/1rrral01) and ( j , bkbk/1rrrbl01) Cb1 , Cb2 in CCC(k) /BFN(k) . All the other edges ofshould be mapped close together on the cycle Cb in CCC( l) /BFN( l) are mapped onto a path on a single cycleCCC(k) /BFN(k) .Cb in CCC(k) /BFN(k) . In both cases, the dilation isdirectly dependent on the distribution d of the guest nodes</p><p>Second Construction on the host cycle and stands partly in contrast to thedesired evenness of the distribution as explained above.Let p(0) , p(1) , . . . , p(k 0 1) {0, 1, . . . , l 0 1}, For low dilation, the values of p(0) , p(1) , . . . , p(k</p><p>p(0) p(1) rrr p(k 0 1). Let pV (0) , pV (1) , . . . , 0 1) should be spread relatively evenly among 0, 1, . . . ,pV ( l 0 k 0 1) {0, 1, . . . , l 0 1} " {p(0) , p(1) , . . . , l 0 1, and the nodes ( i , a0a1rrral01) and ( j ,p(k 0...</p></li></ul>