7
On the Hierarchical Hypercube Interconnection Network Q. M. Malluhi, M. A. Bayoumi and T. R. N. Rao The Center for Advanced Computer Studies University of Southwestern Louisiana La fa ye tte, LA 70504 Abstract This paper explores the Hierarchical Hypercube (HHC) hierconnection networks which is suirable for building massively parallel sysiems wirh rhousands of processors. In this paper we show thut rhe HHC is self-embedded, rhar is, an HHC can embed HHCs of lower dimensions. AI addirion, the paper illustrates thai rhe HHC is a commwnicarion-efj2ient urchiiecture. 7ivo algorithms for daw communication in rhe HHC are presenieil. The fisr ulgorirhm is for one-lo-one transfer and rhe second is ,for one-io-all broadcasting. Borh algorirhms take O( log k), where, k is the ioral number of processors in rhe sysiem. Moreover, the paper show thar rhe HHC VLSI layour hus a relaiively small area which is O((10g log k).k'llog k). I. Introduction Several topologies for interconneclion networks have been proposed by researchers (see for exaniple [SQUI63; STON7 1; PREP8 1 ; BHAT82; HWAN87; KATS88; YOUS90; DAND90; AMAW91; FIDU92; KUMA921). The hylxrcube (HC) roplogy proposed by [SQUI63] is known to be a very powerful topology and has drawn a considerable attention in the last two decades. The hylxrcobe, however, when used in large systems, has some practical limitations. In an n-HC (a hypercubc of degree n or equivalently a hylxrcube with 2" nodes). each procccssor (node) is connected to I? other processors. As tlic degree n increases, each node beconies more dilficiilt to dcsigii and labricate due lo the larger lanout. In lact, this is the most serious drawback of hypercubes and it is olieii considered as the main limiting factor for usin_r hypercubes in laqc systems (PREP8 1; SAADW]. 111 addition, a high degree n-HC lias also the poblem of matcliiiig the intcnial processor sliced with the available wide halidwidth [SAADS9]. The authors have proposed a new hierarchical stntc- tiire for interconnection networks in massively parallel The niilhon \voiiltl like IO acknowledge the support of die piii SSFlLEQSF ADP-O?. systems. This s m m r e is referred to as the Hierarchi- cal Hypercube (KHC) W L L 9 2 ) . The merits of this topology are numerous. Unlike the hierarchical topolo- gies suggested recently for parallel systems [HWAN87; DANDW, KUMA921, the HHC is a homogeneous aiid symmetric structure in which no processor or link play a special role. The HHC can efficiently execute the Divide aiid Conquer (D&Q) class of algorithms which solves a large set of practical problems. A k node HHC requires O(10g log k) connections per processor. Therefore, unlike the hypercube, the HHC implementation is feasible even when k is very large. In this paper, we continue to explore tlie attractive propenies of the HHC interconnectioii network. Section I1 describes the HHC striicture. In Section 111. we address the scalability of the HHC, that is, the ability to embed smaller HHCs into a larger HHC. Section IV describes two algorithms for data conuniinication in the HHC. The first algorithm is for node-to-node communication and tlie second is for one-to-allbroadcasting. Section V deals with the HHC layout for efficient VLSI realization. Finally, \\.e coiiclude the paper in Section VI. U[. HHC structure To simplify the description of the ?I-HHC structure (HHC of 2" let's assume for tlie Lune being that 11 = 2" +m (This conditioii is relascd later on). The 2" nodes are grouped into cliisters of 2"' nodes each, ancl the tiodes io each cluster are connected to lomi iiti m-HC called the Son-cube or Scube. A father cohe. ciilled the Fcube, coiiiiects the '2("-'") = 2'"'Scubcs in il liyperciitx fashion. Edges of the Scubes are called internal edges while edges of the Fcube are referred 10 as esternal edges. An Scube having 2"' nodes is connccled to csactly 2"' esternal edges, each is iiicideiit to one node of thc Scubc. Figure-1 shows two adjacent Scubes io an 1 1-HHC (iiii HHC with n = 11 and 171 = 3). The sequence of binary bits (6,,-16n-2...b,I) will be used as the identifier or address of a tide. The address of a node is divided into two parts; S ~ian and P pan and is 524 1063-7133193 $3.00 0 1993 IEEE

[IEEE Comput. Soc. Press [1993] Seventh International Parallel Processing Symposium - Newport, CA, USA (13-16 April 1993)] [1993] Proceedings Seventh International Parallel Processing

  • Upload
    trn

  • View
    213

  • Download
    0

Embed Size (px)

Citation preview

On the Hierarchical Hypercube Interconnection Network

Q. M. Malluhi, M. A. Bayoumi and T. R. N. Rao

The Center for Advanced Computer Studies University of Southwestern Louisiana

La fa ye tte, LA 70504

Abstract This paper explores the Hierarchical Hypercube (HHC)

hierconnection networks which is suirable for building massively parallel sysiems wirh rhousands of processors. I n this paper we show thut rhe HHC is self-embedded, rhar is, an HHC can embed HHCs of lower dimensions. AI addirion, the paper illustrates thai rhe HHC is a commwnicarion-efj2ient urchiiecture. 7ivo algorithms for daw communication in rhe HHC are presenieil. The fisr ulgorirhm is for one-lo-one transfer and rhe second is ,for one-io-all broadcasting. Borh algorirhms take O( log k ) , where, k is the ioral number of processors in rhe sysiem. Moreover, the paper show thar rhe HHC VLSI layour hus a relaiively small area which is O((10g log k).k'llog k) .

I. Introduction

Several topologies for interconneclion networks have been proposed by researchers (see for exaniple [SQUI63; STON7 1; PREP8 1 ; BHAT82; HWAN87; KATS88; YOUS90; DAND90; AMAW91; FIDU92; KUMA921). The hylxrcube (HC) roplogy proposed by [SQUI63] is known to be a very powerful topology and has drawn a considerable attention in the last two decades. The hylxrcobe, however, when used in large systems, has some practical limitations. In an n-HC (a hypercubc of degree n or equivalently a hylxrcube with 2" nodes). each procccssor (node) is connected to I ? other processors. As tlic degree n increases, each node beconies more dilficiilt to dcsigii and labricate due lo the larger lanout. I n lact, this is the most serious drawback of hypercubes and i t is olieii considered as the main limiting factor for usin_r hypercubes in laqc systems (PREP8 1; SAADW]. 111 addition, a high degree n-HC lias also the poblem of matcliiiig the intcnial processor sliced with the available wide halidwidth [SAADS9].

The authors have proposed a new hierarchical stntc- tiire for interconnection networks i n massively parallel

The niilhon \voiiltl like IO acknowledge the support of die p i i i SSFlLEQSF ADP-O?.

systems. This s m m r e is referred to as the Hierarchi- cal Hypercube (KHC) W L L 9 2 ) . The merits of this topology are numerous. Unlike the hierarchical topolo- gies suggested recently for parallel systems [HWAN87; DANDW, KUMA921, the HHC is a homogeneous aiid symmetric structure in which no processor or link play a special role. The HHC can efficiently execute the Divide aiid Conquer (D&Q) class of algorithms which solves a large set of practical problems. A k node HHC requires O(10g log k ) connections per processor. Therefore, unlike the hypercube, the HHC implementation is feasible even when k is very large.

I n this paper, we continue to explore tlie attractive propenies of the HHC interconnectioii network. Section I1 describes the HHC striicture. In Section 111. we address the scalability of the HHC, that is, the ability to embed smaller HHCs into a larger HHC. Section IV describes two algorithms for data conuniinication i n the HHC. The first algorithm is for node-to-node communication and tlie second is for one-to-all broadcasting. Section V deals with the HHC layout for efficient VLSI realization. Finally, \\.e coiiclude the paper in Section VI.

U[. HHC structure To simplify the description of the ?I-HHC structure

(HHC of 2" let's assume for tlie Lune being that 11 = 2" +m (This conditioii is relascd later on). The 2" nodes are grouped into cliisters of 2"' nodes each, ancl the tiodes io each cluster are connected to lomi iiti m-HC called the Son-cube or Scube. A father cohe. ciilled the Fcube, coiiiiects the '2("-'") = 2'"'Scubcs i n il liyperciitx fashion. Edges of the Scubes are called internal edges while edges of the Fcube are referred 10 as esternal edges. An Scube having 2"' nodes is connccled to csactly 2"' esternal edges, each is iiicideiit to one node of thc Scubc. Figure-1 shows two adjacent Scubes io an 1 1-HHC ( i i i i

HHC with n = 11 and 171 = 3).

The sequence of binary bits (6,,-16n-2...b,I) will be used as the identifier or address of a tide. The address of a node is divided into two parts; S ~ian and P pan and is

524 1063-7133193 $3.00 0 1993 IEEE

Figure- 1: Two Adjacent Scubes in 11-HHC.

___ b*smlb+ - & t d E &

Figure- 2: A 5-HHC.

represented as a two-tuple (s, p ) . The S part is the ti-in bits binary number (bn-lbn-2 ... bm) representing the address of the Scube in which the node is located. The P part is the m bits binary number (6m-16n1-?. . .b~) representing the address of the node (processor) within the Scube. We will denote the S and P parts of a node by the node name catenated with s and y respectively. For example, the S and P parts of node A are As and Ay, respectively.

Thos, an HHC node (s, y ) is connected to: ( 1 ) m nodes in the sanie Scube through internal edges. These are the nodes whose addresses are found by changing only one bit of the P part of the address. (2) Exactly one node in a neighbor Scube through an external node corresponding to the changc of the p‘” bit ofthe S part of rhc address.

To simplify the descri])tion of the HHC structure, i t was assumed that 7a = 2”” + 777.. However, it is easy IO generalize the structure for an arbilraiy 71, by choosing 171

to be the smallest integer such that 71, 5 2”’ + 77) . In this sitoation, some nodes will not possess an external edge . A 5-HHC is shown in Figure-2. Notice that even though the structure looks like that of a CCC it is not a CCC. The nodes of an Scube are connected in a cube fashion not in a cyclic fashion. When the condition 77 = 2”’ + 777.

is satisfied, the HHC is referred to as a perfect HHC.

The following two important properties of the n-HHC stnicture are provided in [MALL92]:

1. 2.

The degree of the n-HHC is m + 1 = U(1og n). The diameter D of the n-HHC is 5 2m+1 = U ( n ) .

111. Recursive definition of HHC structure

This section addresses the scalability feature of the HHC structure which has a number of attractive conse- quences. By scalability we mean, the ability to define an (n+l)-HHC in terms of n-HHC. Scalability implies that the HHC perfommce does not decline when the inpiit size does not match the number of processors in the system. Another consequence of being able to constntct an (n+l)- HHC from lower dimension HHCs is the increased fault tolerance arising from the ability to use the smaller dimcnsion HHCs when an unpleasant event affects the links or the nodes of the original HHC. In building up an (n+l)-HHC from an n-HHC we differentiate between two cases: (1) the n-HHC is perfect and (2) the n-HHC is not perfect. Below, it is shown what to do in each of these cases. We use II to refer to the concatenation operator and NBif s ( s ) to refer to the number of bits in the binary numbcr s.

if 17=2”’+tn then {n-HHC is lxrfect} 1 . Duplicate each Scube and give the same node names

2. Renanle each node (As, Ay) of the origiiial n-HHC

3. Rename each newly generated node (As, Ap) as (As,

4. Coniiect (As, OAP) with (As. 1Ap).

1. Duplicate the n-HHC. 2. Renanle each node (As, Ap) of the original n-HHC

3. Renanic each node (As, Ap) of the newly generated

4. Conucct (OAs, NBifs(As)) with (lAs, NBirs(As)).

Figure-3 illustrates the constntctioii of the 4-HHC I‘rom the 3-HHC and then the 5-HHC from thc 4-HHC. A sunplc application of the principle of mathematical indiicttoii together with the constniction procedorc abovc, can show the following lemma.

Io the newly generated cubes.

;is (As, OIIAp).

1 My).

else {n-HHC is not perfect}

as (OIIAs, Ay).

rz-HHC as (IIIAs, Ay).

end.{if}

Lemma 1: A t-HHC can be embedded into the ti-HHC graph I’or any I I tz.

IV. Data communication in HHC Efl’cctive data comniunication is cnicial in parallel

systems. In this section, we introduce two efficient algorithms for data transfer in the HHC. The first is for oi~-ro-one comniuuication and the secontl is for

525

1V.A. Definitions and terminology:

Figure- 3: Construction of 4-HHC from 3-HHC and 5-HHC from 4-HHC.

one-to-all communication (broadcasting). Efficient one- to-one transfer is essential because it is the most basic type of commiinication. Likewise, fast broadcasting is important becaiise it is employed in a large nu" ber of well-known algorithms such as Gausian elimina- tion, Conjugate Gradient algorithm [SAAD89], Matrix- Vector multiplication, Matrix-Matrix multiplication, LU- factorization, House Holder Uansfonnation [JOHN891 as well as various image processing applications.

It is natural to try to find optimal coii~niunication algorithnis that inciir the smallest possible number of time units. However. the hierarchical stnictiire prescribes that we deal with the problem at various levels of hierarchy. Thus, instead of finding the globally optimal conimunica- ti011 procedure, we will divide the problem logically into two parts. The first part is concenied with data transfer within the Fcube (belween Scubes) awl the second part is for data communication within the Scubc. An optimal con~munication procedure will be used for each of these two parts. The two locally optinla1 procedures will be merged together in a way that produces a near-optinial communication algorithni for the HHC. The division of thc coiiiniiinication algorithnis into two parts is rather conceptual. la the actual algorithms presented, these two parts are molded together ending up with algorithms which route through the entire HHC and which mix and interleave con~m~uiication steps of these two conceptual parts.

Before presenting the HHC communication algorithms we need to introduce some definitions. A Hamiltonian path of a graph is a path that passes over each node of the graph exactly once. The problem of finding a Hamiltonian path in a R-HC is the problem of finding a 2k sequence of the k bit distinct binary numbers such that any two consecutive iionlbers have only one bit difference. Such a sequence of binary numbers exists and is called a Gray code. For example, the sequence of binary numbers, (000,001,011,010,110,111,101,100) is a 3 bits Gray code representing a Hamiltonian path in a 3-HC starting at node OOO and ending at node 100.

Definition 1: G, is defined to be the m bits Gray code obtained by the recursion:

G I = (0,1)

Gt+i = (OG,, 1GP)

Where, GiR is the sequence obtained by reversing the order of the numbers of Gi, and OGillGi is the sequence obtained by concatenating 0/1 to each element in the sequence Gi.

Definition 2: Gk,, where, 0 5 k 5 2"' - 1, is the rotation of G, rintil k is the first elenient in the sequence.

Definition 3: The partial order 5 (Gk-less) is defined

over the set of integers less than 2"' as follows: AS B (read as A is G'-less than B ) if and only if A precedes B

in GkL. Conversely, &A (read as 13 is G'-greater than

A) if aiid only if ASB.

k

k

k

c

Example: Using the recursion in Definition 1:

G~ = (on,oi,ii,io)

G~ = (onn,noi,oii, n i o , iin,iii, ini,ion) = ( n , i , 3 , 2 , G , i , 5,<1)

By Definition 2,

G: = G S = ( 0 , 1 , 3 , 2 , 6 , 7 , 5 , 4 )

G: = ( 2 , 6 , 7 , 5 , 4 , 0 , 1 , 3 )

G: = ( i , 5 , 4 , 0 , 1 , 3 , 2 , 6 )

Notice that any G'k2 represents a Hamiltonian path in an m-cube. From Definitioi! 3, we have the relalions; 3 3 2, 2 5 3, and 5 i. 1 and hence, we have, 2 3, 3 2, and 1 5. In G:, 0 is the Go-smallest and 4 is the Go-greatest. 0

n 2 I

526

1V.B. One-to-one communication:

Let A=(As, Ap) be the source node and B=(Bs, Bp) be the destination node. The followiog algorithm is a routing procedure executed by the source node A and by every node C=(Cs, Cy) in the path to the destination. The algorithm uses the procedure Scube-routing which is no more than a typical routing algorithni for an ordinary hypercube.

procedure node-to-node-routing(C, B , M) {Rout message M at node C to destination B } begin

if C # B then if Cs=Bs then

else Scube-routing(C, B, M);

D := C@B; I := The set of indices of 1’s in Ds; if Cp E I then

else Send M 011 external edge;

i := Gc~-sniallest element i n I; Saibe-routing(C, (Cs,i), M);

end; { if) end;{if)

end;{if} end; {node-to-node-routing}

procediire Scube-routing(A, B, M) {Rout message M from node A to node B within the same Scube} begin

Dp := ApeBp; j := index of first 1 in Dp; Send M 011 internal edge along the f h dimension;

end ; { Scii be-routing }

Example: If the soiirce A is (0000,OO) and the des- tination B is (01 1 l,Ol), the path produced by Ixocedure iiode-to-node-routing is (OOOO,OO), (OOO1 ,OO). (0001,01), (oo11,01), (ool l , l l ) , (ooll,lO), (0111,10), (01 11,11), (0111,01). 0

The algorithni uses a shortest path between the soiirce Sciibe As and the destination Sciibe Bs in the Fcube. The shortest path used has the property that edges of the Fcube are traversed in ascending order (sorted with respect

to the relation 5). The rationale behind this choice is to make the P part values on the path to destination Scube change in one direction and to prohibit its fluctuation. This piits an upper linut on the nimiber of internal links that are traversed in order to change the S part (by moving

AP

Figure- 4: Broadcasting in a %cube.

on external links) to the required value Cs. This upper limit is equal to ] G i p [ - 1 = 2m - 1.

Theorem 1: The length of the path produced by node- to-node-routing algorithm is I H(As, Bs)+2”’+m-l, where, H(As, Bs) is the Hamnling distance between the S parts of nodes A and B .

Proof: Let II be the path from A to B produced by the algorithm node-to-node-routi~lg. Let II=II II2 where, II, is the path from node A to the entry node (Bs, e) of the destination Scube Us (i.e.. the first node of Scube Rs encountered in II), and 112 is lhe path from (Os, e) to B within the destination Sciibe Bs. Obviously, III contains exactly H(As, Bs) external links. Moreover, the number of intcma~ links in n1 is at most I G ~ ~ I - 1 = 2na - 1. The path 112 has no external edges but at most m internal edges. Siuilming up, the length of path II is I H(As, Bs)+2”+m- 1. 0

Since the S part of a node address consists of less than or eqoal to 2”’ bits, we have H(As, Bs) I 2”’. Thus, IIIl 5 2”’+l + m - 1 5 2(am + m) = 272 = O ( n ) .

1V.C. One-to-all communication:

One-to-all transfer (broadcasting) can also be examined at two levels of hierarchy. At the higher level, broadcast is pcrfonned within the Fcitbe tising external edges to address the message to every Scube. At the lower level, broadcast is performed within each Scube using internal edges. Since the Sciibes and the Fcube are all hypercubes, we first introtluce a broadcast algorithm for the hypercube. This algorithm is a generalized version of the known one- to-all-comn~uiiicatioii procedure in hypercubes [KATSS8; SAAD89; JOHN891.

Le1 Q be any partial order on the set of integers z?k = { U I o 5 Q < P}. Procedure HC-Broadcast com- municates the message M from node A to every other node in a k-cube.

procediire HC-Broadcast(A, M ) {Broadcasts message M from node A to every other node in the HC} begin

A sends M to all of its neighbors;

527

for any node B receiving M on dimension i do for eveiy dinlension j such that i c y j do

B sends M on j ; end; {HC-Broadcast}

HC-Broadcast has the following properties:

1. The message M reaches every node i n the HC once and exactly once. This is ensured by the use the partial order cy.

For all nodcs B, M is routed from A to U along a shortest path n. That is, In1 = H ( A , U ) . For all nodes 13, the links on the pal11 TI (i.e., the path followed from A lo U ) are sorted with respect to the partial order cy.

Figure-4 shows the broadcast operation in a 3-HC where U is the ordinary less than or equal relation (I).

An efficient HHC broadcast algorithm based 011 procc- dure HC-Broadcast is oblainetl by using the partial order A 11 5 for Fcubc broadcasting and the partial order 5 !'or broadcasting in llic Scubcs. Broadcasting wilhin an Sciibc starts as soon as the message is received. A tiaiisfcr 0 1 1

a n external edge (broatlcasting in Fcube) from a n Scubc waits until the propcr exit node in the Scube has rcceivcd the message. This is the node whose P part matches the din~ension of the external link in the Fcube.

2.

3.

procediire HHC-Broadcast(A, M) {Broadcasts message M from node A to every other node in the HHC} begin

A sends M to all of its neighbors; for any node 11 receiving M on a n external edge do

B sends M on every adjacent iutemal edge; for any node B receiving M on an internal edge on

dinlension i do begin

for every internal edge on dimension j such that i I j do

B sends M on j ;

B sends M on external edge;

I := The set of indices of 1's in Cs; k := @p-greatest elenient in I; if k 5 Bp then

B sends M on external edge;

if As=Bs then

else

AP

end;{if}

end; {HHC-Broadcast} end;{for}

An Scube is entered through an external edge once and only once. This is eiisiired by the use of the partial

order 5 for Fcube broadcasting. Once the S c u b is entered, broadcasting starts within this Scube. Thcrcforc, I n procedure HHC-Broadcast, a node receiving M 011 an external link sends M 011 every adjacent internal link. A node receiving M 011 an internal link does two things: (1) it continues the broadcast operation already started within the Scube, and (2) it continues the Fcube broadcasting by forwarding M 011 the external link if one of the two conditions is met:

a. The current Scube Bs is the soiirce for Fcube broad- casting (i.e. Bs=As), therefore, M should be sent on every external link adjacent to Scube Bs (examine prcxediire HC-Broadcast).

A P k 5 Bp for every Fcube dimension K traversed ear- lier. This is equivalent to checking that the Fcube diniension on which the Scube Bs has beeti entered is G"Vess than the dirnensioii of the external link inci- dent to node B (reexamine Procedure HC-Broadcast).

Figure-5 illustratcs the broadcast trce produced by HHC- Broadcast algorithm i n a GHHC.

Let B be an arbitrary node in the HHC and let Il be the pat Ii produced by procedure HHC-Broadcast along which M is routed rrom the source node A to B. The path II has the following characteristics:

1 . The nun~ber of external links in n is H(As,Bs) (by property 2 of HC-Broadcast).

2. A portion of II containing internal links only is of nlinu~wm length. That is, if this portion is between nodes ( C S , ~ ) and ( C S , ~ ' ) then, its length is equal to H ( p , y') (by property 2 of HC-Broadcast).

A P

b.

3. The external links in II appear i n GAp-ascendiiig order of their dimensions (by prolleny 3 of HC- Broadcast). The liilks in a portion of II containing internal links only appear in ascending order of their dimensions (by property 3 of HC-Broadcast).

4.

Theorem 2: Algorithm HHC-Broadcast requires at most n + 2" - 1 time units (conmlunication steps) for completion.

Proof: Let ll be as given above and write II=IIIIIz where II, and 112 are as in the proof of Theorem 1. By (b)

property 2 of the path II, 111 contains exactly H(AsJs) external edges. By properties 3 and 4, the number of internal edges in II, is not greater than the length of a

Therefore, we have the nimiber of internal links in 111 is

Figure- 6: (a) Efficient Layout of a 5-HHC, (a) Efficient Layout of a 6-H-K.

Figure-6 shows two efficient (of less area) layouts for

to collrlect the extenla] edges

Hanliltoiiiaii path in an m-cube (comsl)olldillg to G:',p). the 5-HHC the GHHC. The idea is to ]ay the Scubes a,ollg the vertical axis

less than Or to 2m-1. Moreover, we have "2'

''. Because f f ( A s , Bs) 5 ?I - ??J , WC Call W n t C In I 5 71 + 2'n - 1.

011 the horizolital axis. A]olig the same lines, a layout for an n-HHC with an arbitrary value of 11 call be easily He'1ce* In' ' H(r'sJ B s ) + 2i1' + '" - '. developed.

Therefore, the depth of the broadcast tree generaled by HHC-Broadcast is less Ihan or equal to 7?+2r'a- 1 and thus, algorithm HHC-Broadcast can be completed in ? I +2"" 1 communication steps. Cl

We know that 2"' is O(n). Thus by Theorem 2, HHC- Broadcast requires O(n) time units. This is the same as the time complexity for broadcasting a message in a hypercube. We should not leave this section without comparing the time required by the near-optimal solution produced by HHC-Broadcast with the time of an optimal solution. The time of an optimal solutioii is specified by the diameter D of the topology. From Sectioii IV we know that D 5 Tnt1. Thus, one can easily wrify that D - ( n + 2"' - 1) 5 2"' - ? t + 1 5 I J ? + 1. Tliereli,rc, the maximum departure of HHC-Broadcast from tlic optimal solution is by 177 + 1 time units. As a final comment, we note that the route produced by HHC-Broadcast along which M is forwarded to a node B is identical to the route generated by 11rocedure node-to-notlc-roi~tiog(A, 8, M).

V. Layout of the HHC The area of a VLSI layout is a very fundamental

meastire of how good is the circuit. The area of a circuit is important because the larger the area, the greater the probability that there is a flaw iii a fabricated chip, the smaller the yield of chips of that area. I n fact, the cost of fabricating a chip is an exponeiitial fiinctioii of the area.

Lemma 2: The area AtlHC of an n-HHC, having k=2" processors, has an asymptotic complexity of 0 ((log log k ) . k 2 / log k ) .

Proof: The width of the HHC circuit is 2"-i12(2m - 2) because we have 2"-m Sciibes, each having a width of (2" - 2). The height is at maximum ( m / 2 + l)(2'a-'n - 1). The (nil2 + 1) factor comes from the fact that the degree of each node is 712 + 1. The maximum hciglit occiirs when the HHC is perfect. Thus,

A H H C 5 ( m / 2 + 1). (?"-"' - 1) . ( 2 " - " ' ) . (2'" - 2 )

< ,,) . 2?("-"') . 2"' -

5 i n . k2/2"'

Noting that log k = 72 = 8(2m) and that 717. is O(log1og k ) , we get, AIfmc = O((log10gk) .k2 / logk) .

A VLSI layout of an ti-cube requircs chip area AHC; = O(log6 . b') which is larger Ihan A,,lfc. From another prospect, AffHC can be compared to the lower bound of the circuit area. It has been proved that .4 T? 2 c I;' where. A is the circuit area, Tis the conil~itation time o f the circuit, c is a constant dependant 011 the technology, and k is the problem size [PREP8 11. According to this, the theoretical minimum area of the HHC is 0 ( k 2 / log' b ) . The HHC relative chip area compared to the theoretical nlinimiim is O((log1ogk) . logb).

0

VT. Conclusions This palxr investigates the Hierarchical Hylxrcube

interconnection network which is appropriate for implc-

There are-various models for VLSI circuits that help 11s estimate the area of the circuit. In this paper, we are going to me the grid VLSI model of comptitation (see [ULLM84]).

529

menting massively parallel systems. The hierarchical hypercube is an ensemble of hypercubes connected in a hyperciibe fashion. The attraction of this interconnection network emerges from the fact that it is feasible to be implemented with thousands of processors and retains a good performance. The number of U 0 ports required for a processor is approximately the logarithni of that required in the counterpart hypercube topology.

We have shown that a n HHC can embed HHCs of lower dimensions. This prevents j)erlomance degradation of algorithms when there is a mismatch between the input size and the topology size (as happens in the CCC). Moreover, this characteristic indicates a flavor of fault tolerance in the HHC stnicture. In addition, we have demonstrated that the HHC layout is more compact than the hypercube. This reduces the HHC fabrication cost and increases the turnout of its chips. It is shown that the HHC is a corr~iunication-efficiellt architecture. Two efficient commiinication algorithms arc presented. The first is for node-to-node transfer and thc second is lor one-to-all broadcasting.

References [AMAW91] A. El-Amawy and S . Latifi, “ Properties and Perfomiance of Folded I-lypercu bes,” IEEE Trunsuctions on Parallel and Distributed systems, vol. 2, No. 1, pp. 31-42, January 1991.

[BHAT82] K. V. Bhat, ‘’ On Properties of Arbitrary Hypercubes,” Cotnp. & Moths wilh Appls., vol. 8, No. 5, pp. 339-342, 1982.

[DAND90] S . P. Dandamudi and D. L. Eagcr, “Hicnr-chical Interconnection Networks for Multicomputer Systems.” IEEE 7)-ansuctions on Purullul and Disfribututl systetns, vol. 39, No. 6, pp. 786-797, June 1990.

[I;IDU92] C. M. Fiduccia, “Bussed Hypercube and Other Pin-Optimal Networks,” IEEE Transactions on I”wlle1 and Distribufrcl systetns, vol. 3, No. 1, pp. 14-24. January 1992.

[HWAN87] K. Hwang and J. Ghosh, “I-lypemct: A Coiruiiunication-~~ticient Architecture for Consiructing Mas- sively Parallel Coiiiputers,” IEEE Trans. Cotnpt., vol. C-36,

No. 12, pp. 1450-1466, December 1987.

[IBAR90] 0. H. I b m antl S M. Sohn, “On Mapping Systolic Algorithms onto the Hypercube,” IEEE Transacfions on Purallel and Distributed systerns, vol. 1. No. 1, pp. 48-63, January 1990.

[JOHN891 S . L. Johnsson and C. T. Ho, “Optimum Broadcasting and Personalized Comnunication in Hypercubes,” IEEE Truns. Cofnpt., vol. 38, No. 9, pp. 1249-1268, September 19S9.

[ KATS8Sl H. P. Katseff, “Incomplete Hypercubes,” IEEE Trans. Cotnpt., vol. 37, No. 5, pp. 604-608, April 1988.

[KUMA92] J . M. Kumar and L. M. Patnik, “Extended Hypercube: A Hierarchical Interconnection Network of Hypercubes,” IEEE Transactions on Purullel and Distributed systetns, vol. 3. No. 1, pp. 45-57, January 1992.

[MALL921 Q. M. Malluhi, M. A. Bayoumi and T. R. Rao. “The Hierarchical Hypercube: An Interconnection Network for Integrated Parallel Systems,“ 21st Int. Conf. on Par. processing, August 1 Y9Z.

[PIIEP81] F. P. Preparata and. J. E. Vuilleinin, “The Cube-Connected Cycles: A Vcrsatile Network For Parallel Computation,” Corntnun. ACM, vol. 24, No. 5, pp. 300-309, May 1981.

[SAAD88] Y. Saad and M. H. Schultz, “Topological Properties of Hypercubes,” IEEE Tram. Cotnpt., vol. 37, No. 7, pp. ’ 867-872, July 1988.

[SAAD89] Y. Saad and M. H. Schultz, “Data Coimnunicaiion in I-lypercubes,” Journal of Parallel and Distributed Computing 6, pp. 115-135, 1989.

[SQUI63] J. Squire and S . M. Palais, “Progmmning antl Design Considerations of a Highly Parallel Computer,“ h o c . AFIP Spring Joint Compt. Conr:, vol. 23. pp. 395400, 1963.

[ULLM84] J. D. Ullnim, Computational aspects of VLSI, Computer Science press, MD, 1984.

[YOUS901 Abclou S . Youscf and Bhagirath Narahari, “Thc Banyan-Hypercube Networks,” IEEE Truns. on Purallel & Di.W-ibutrrl System, vol. 1, No. 2, pp. 160-169, April 1990.

530