7
1172 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 4, NO. 10, OCTOBER 1993 Optimal Routing Algorithm and the Diameter of the Cube-Connected Cycles Dikran S. Meliksetian and C. Y. Roger Chen Abstract-Communication between processors is one of the most im- portant issues in parallel and distributed systems. In this paper we study the communication aspects of a well known multiprocessor structure, the Cube-Connected Cycles (CCC). Only nonoptimal routing algorithms and hounds on the diameter of restricted subclasses of the CCC have been presented in earlier work. In this paper, we present an optimal routing algorithm for the general CCC, with a formal proof of its optimality. Based on this routing algorithm, we derive the exact network diameter for the general CCC. Index Terms-Cube-connected cycles, diameter, hypercube, multipro- cessor, routing. I. INTRODUCTION The implementation of parallel and distributed algorithms on a multiprocessor system requires intensive communication between processors. Although the details of implementation may vary, this is true for a multiprocessor system using a shared memory as a means of communication as well as one communicating through a static or dynamic interconnection network. The cube-connected cycles (CCC) was proposed by F. P. Preparata and J. Vuillemin in [7] as a general-purpose parallel system. The authors discussed and analyzed the implementation of only AS- CENDDESCEND type algorithms, and showed that the CCC can perform this class of algorithms with the same time complexity as the hypercube or the shuffle-exchange network, while requiring much less hardware and a smaller layout area. More recently, there have been various investigations of the CCC, either on its VLSI implementation [4], [5], [8], on its equivalence with the circular shuffle network [3], or on its adaptation as a systolic array [2]. However, in order for the CCC to be considered as a general-purpose parallel system, other communication issues must be addressed, including an efficient optimal routing algorithm which is a prerequisite for efficient communication and the network diameter which is a parameter influencing many applications such as single node broadcast. The point-to-point optimal routing is relevant when the communication traffic is light to moderate, where congestion would be minimal, and the communication delays would be due mainly to the number of links traversed. Under the same conditions, the network diameter is a measure of the maximum communication delay between nodes. Although nonoptimal routing algorithms and bounds on the di- ameter of restricted subclasses of a CCC have been presented, we present in this paper for the first time an optimal routing algorithm and an exact expression for the diameter of the general CCC. Earlier work in this respect includes the work of L. D. Wittie in [ll], where he proposes a nonoptimal routing algorithm and determines the maximum path length based on that algorithm. An example is given in Section 111, where the path determined by this algorithm is 60% longer than the shortest path. Moreover, the routing algorithm Manuscript received July 15, 1991; revised July 24, 1992. D. S. Meliksetian is with the Department of Electrical Engineering, South C. Y. R. Chen is with the Department of Electrical and Computer Engi- IEEE Log Number 9213481. Dakota School of Mines & Technology, Rapid City, SD 57701-3995. neering, Syracuse University, Syracuse, NY 13244-1240. and the analysis presented in [ll] are applied only to a restricted subclass of the CCC, i.e., the class of CCC(d,d) in our notation (see next section for the definitions). Although a generalization of this algorithm to the general case is possible, the paths determined for this case would be considerably longer than the shortest paths. The difference between the path length determined by this algorithm and the path length determined by our optimal algorithm would increase with the number of nodes in the cycles. In [9] A. M. Schwartz and M. C. Loui present a bound on the diameter of another subclass of the CCC, i.e., the class of CCC(d,2'). This bound is more than twice the exact expression for diameter of a CCC that we derive in this paper. The routing algorithm and the analysis we present in this paper are applicable to the general formulation of the CCC. We consider the general formulation of the CCC for a number of reasons. First, the original formulation of the CCC in [7] is of the form CCC(d,2'); this formulation is necessary if the CCC has to emulate a hypercube with an identical number of nodes. Second, by having extra nodes in the cycles, it is possible to reconfigure a system with faulty processors [l], [lo]. Moreover, we present an optimal routing algorithm and derive an exact expression for the diameter of the CCC. The knowledge of the exact expression for the diameter is required to judge whether previous bounds are good or not. The paper is organized as follows. In the next section the CCC is reviewed. In Section 111 we construct the optimal routing algorithm, proving its optimality as we proceed. In Section IV we study the topological properties of the CCC. Finally, in Section V, we summarize our results. 11. THE CUBE-CONNECTED CYCLES The CCC [7] is a modification of the hypercube (also called binary cube) obtained by replacing each vertex of the hypercube by a cycle. Each cycle must have at least as many processors as the dimension of the hypercube. In the most general case the CCC is described by two parameters: d, the dimension of the underlying hypercube, and k, the number of processors per cycle. Of course k must be greater than or equal to d. We will denote this CCC as CCC(d, k). For d 5 2, the corresponding network is trivial; hence only d 2 3 is considered. Since a CCC(d, k) has 2d cycles each having k processors, there are k x 2d processors in all. Each processor is assigned an address consisting of a pair of numbers (c,p), where c represents the cycle and satisfies 0 c< 2d- 1, and p represents the position of the processor within the cycle and satisfies 0 <p 5 k- 1. Let bin(c,p) denote the pth bit of the binary representation of c. Then the interconnections between processors can be defined formally as follows. A processor (c, p) is connected to the following processors: processor (c, (p + 1) mod k) cycle connection or cycle edge; processor (c, (p - 1) mod k) cycle connection or cycle edge; 1045-9219/93$03.00 0 1993 IEEE

Optimal routing algorithm and the diameter of the cube-connected cycles

  • Upload
    cyr

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Optimal routing algorithm and the diameter of the cube-connected cycles

1172 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 4, NO. 10, OCTOBER 1993

Optimal Routing Algorithm and the Diameter of the Cube-Connected Cycles

Dikran S. Meliksetian and C. Y. Roger Chen

Abstract-Communication between processors is one of the most im- portant issues in parallel and distributed systems. In this paper we study the communication aspects of a well known multiprocessor structure, the Cube-Connected Cycles (CCC). Only nonoptimal routing algorithms and hounds on the diameter of restricted subclasses of the CCC have been presented in earlier work. In this paper, we present an optimal routing algorithm for the general CCC, with a formal proof of its optimality. Based on this routing algorithm, we derive the exact network diameter for the general CCC.

Index Terms-Cube-connected cycles, diameter, hypercube, multipro- cessor, routing.

I. INTRODUCTION The implementation of parallel and distributed algorithms on a

multiprocessor system requires intensive communication between processors. Although the details of implementation may vary, this is true for a multiprocessor system using a shared memory as a means of communication as well as one communicating through a static or dynamic interconnection network.

The cube-connected cycles (CCC) was proposed by F. P. Preparata and J. Vuillemin in [7] as a general-purpose parallel system. The authors discussed and analyzed the implementation of only AS- CENDDESCEND type algorithms, and showed that the CCC can perform this class of algorithms with the same time complexity as the hypercube or the shuffle-exchange network, while requiring much less hardware and a smaller layout area.

More recently, there have been various investigations of the CCC, either on its VLSI implementation [4], [5], [8], on its equivalence with the circular shuffle network [3], or on its adaptation as a systolic array [2]. However, in order for the CCC to be considered as a general-purpose parallel system, other communication issues must be addressed, including an efficient optimal routing algorithm which is a prerequisite for efficient communication and the network diameter which is a parameter influencing many applications such as single node broadcast. The point-to-point optimal routing is relevant when the communication traffic is light to moderate, where congestion would be minimal, and the communication delays would be due mainly to the number of links traversed. Under the same conditions, the network diameter is a measure of the maximum communication delay between nodes.

Although nonoptimal routing algorithms and bounds on the di- ameter of restricted subclasses of a CCC have been presented, we present in this paper for the first time an optimal routing algorithm and an exact expression for the diameter of the general CCC. Earlier work in this respect includes the work of L. D. Wittie in [ l l ] , where he proposes a nonoptimal routing algorithm and determines the maximum path length based on that algorithm. An example is given in Section 111, where the path determined by this algorithm is 60% longer than the shortest path. Moreover, the routing algorithm

Manuscript received July 15, 1991; revised July 24, 1992. D. S. Meliksetian is with the Department of Electrical Engineering, South

C. Y. R. Chen is with the Department of Electrical and Computer Engi-

IEEE Log Number 9213481.

Dakota School of Mines & Technology, Rapid City, SD 57701-3995.

neering, Syracuse University, Syracuse, NY 13244-1240.

and the analysis presented in [ll] are applied only to a restricted subclass of the CCC, i.e., the class of CCC(d,d) in our notation (see next section for the definitions). Although a generalization of this algorithm to the general case is possible, the paths determined for this case would be considerably longer than the shortest paths. The difference between the path length determined by this algorithm and the path length determined by our optimal algorithm would increase with the number of nodes in the cycles. In [9] A. M. Schwartz and M. C. Loui present a bound on the diameter of another subclass of the CCC, i.e., the class of CCC(d,2'). This bound is more than twice the exact expression for diameter of a CCC that we derive in this paper. The routing algorithm and the analysis we present in this paper are applicable to the general formulation of the CCC. We consider the general formulation of the CCC for a number of reasons. First, the original formulation of the CCC in [7] is of the form CCC(d,2'); this formulation is necessary if the CCC has to emulate a hypercube with an identical number of nodes. Second, by having extra nodes in the cycles, it is possible to reconfigure a system with faulty processors [l], [lo]. Moreover, we present an optimal routing algorithm and derive an exact expression for the diameter of the CCC. The knowledge of the exact expression for the diameter is required to judge whether previous bounds are good or not.

The paper is organized as follows. In the next section the CCC is reviewed. In Section 111 we construct the optimal routing algorithm, proving its optimality as we proceed. In Section IV we study the topological properties of the CCC. Finally, in Section V, we summarize our results.

11. THE CUBE-CONNECTED CYCLES The CCC [7] is a modification of the hypercube (also called binary

cube) obtained by replacing each vertex of the hypercube by a cycle. Each cycle must have at least as many processors as the dimension of the hypercube.

In the most general case the CCC is described by two parameters: d, the dimension of the underlying hypercube, and k, the number of processors per cycle. Of course k must be greater than or equal to d. We will denote this CCC as CCC(d, k). For d 5 2, the corresponding network is trivial; hence only d 2 3 is considered.

Since a CCC(d, k) has 2d cycles each having k processors, there are k x 2d processors in all. Each processor is assigned an address consisting of a pair of numbers ( c ,p ) , where c represents the cycle and satisfies

0 c < 2 d - 1,

and p represents the position of the processor within the cycle and satisfies

0 < p 5 k - 1.

Let bin(c,p) denote the pth bit of the binary representation of c. Then the interconnections between processors can be defined formally as follows. A processor (c , p) is connected to the following processors:

processor (c, (p + 1) mod k) cycle connection or cycle edge; processor (c, (p - 1) mod k) cycle connection or cycle edge;

1045-9219/93$03.00 0 1993 IEEE

Page 2: Optimal routing algorithm and the diameter of the cube-connected cycles

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 4, NO. 10, OCIDBER 1993 1173

and

processor (c + E x 2 p , p ) hypercube connection or hypercube edge along the pth dimension,

if 0 5 p 5: d - 1

where

E = { +1 if bin(c,p)=O -1 if bin(c,p)=l.

Hence, the degree of a node is at most three. In the literature, two special cases of CCC(d, k ) are considered,

i.e., k = 2‘ where T is the least integer such that 2‘ 2 d, and k = d . The first case is the original formulation of the CCC in [7], where

the total number of processors is 2” with n = d + T , thus making it possible for the CCC to emulate a hypercube. It is worthwhile to note that in each cycle there exist 2“ - d processors without a hypercube connection; thus this network is not vertex symmetric in general.

The network obtained in the second formulation is vertex sym- metric. This special form of CCC(d ,d ) , although not very useful in practice, has been considered in the literature (such as in [ l l ] ) because it is easier to analyze. For example, in the derivation of the optimal routing algorithm, we first consider the case of a CCC(d, d) , then generalize the result to the case of a CCC(d, k).

Fig. 1 represents a CCC with d = 3 and k = 3, while Fig. 2 represents another CCC with d = 4 and k = 5. We have represented the CCC in Fig. 2 with the most economical layout as described in [7].

111. CONSTRUCrlON OF THE OPTIMAL ROUTING ALGORITHM

In this section, we establish the optimal routing algorithm for the case of a CCC(d, d ) and prove its optimality. Then we generalize it to the case of a CCC(d ,k ) .

For the first part of this section we are considering only a CCC(d ,d ) . Since a CCC(d ,d ) is vertex symmetric, we can always relabel the nodes so as to have the source coincide with (0,O). Hence, our purpose, for this section, is to find a path of shortest length from node (0,O) to node (c, p). Note that this path need not be unique.

We start by stressing the point that routing on the CCC is not as simple as routing on the hypercube. To illustrate this, consider the following example on a CCC(l0,lO).

Assume that we want to determine a path from (0,O) to (771,2). The binary representation of 771 is 1100000011. As we will prove later in this section, since there are four 1’s in the binary representation of 771, the shortest path will include only four hypercube edges. Unlike the routing in the hypercube, the order in which these dimensions are traversed is important, since they yield different path lengths.

The path which is determined by our algorithm is the following:

(0000000000,0) * (0000000001,0) + (0000000001,9) *

(1100000001,9) + (1100000001,0) + (11000000001,1) * (1100000011,1) + (1100000011,2)

(1000000001,9) --$ (1000000001,8) (1100000001,8) +

where each * represents a hypercube edge and each + represents a cycle edge. It has length 10, and is a shortest path between (0,O) and (771,2). An inappropriate choice of the order in which the hypercube edges are traversed will yield longer paths. For example, if it is decided to move in only one direction, clockwise or counterclockwise, until the destination is reached, paths of lengths 16 and 14 will be obtained. This example shows that an inappropriate choice of the order of traversals will yield paths which are much longer than the shortest path. The routing algorithm proposed by L. D. Wittie in [ l l ]

Fig. 1. CCC with d=3 and k 3 .

Fig. 2. CCC with d=4 and k=5.

would produce a path of length 16 for this example, thus resulting in a path 60% longer than the shortest path.

We now present some of the notation used in this section for de- veloping the optimal routing algorithm. We will denote by path(c,p) a path from node (0,O) to node ( c , p ) . The length of this path is equal to I, + I h , where the first term represents the number of cycle edges in the path, and the second term represents the number of hypercube edges. We denote by I C [ the number of bit positions with a value of 1 in the binary representation of the cycle address c. For c 2 2, we denote by dl and d2, respectively, the first and last bit position from the right which have a value of 1, in the binary representation of c if c is even, and in the binary representation of c - 1, if c is odd. With this definition d l and d2 are the same for a cycle c and the cycle c @ 1. If JcJ = 1, then dl and d2 are equal. We denote by d3 and d4 the bit positions, which have a value of 1 and which limit the longest subsequence of 0’s between d l and d2. For example, in a CCC(10,10), if c = 276 or in binary c = 0100010100, then dl = 2, d2 = 8, d3 = 4, and dq = 8. We have the same values for d l , d2, d3, and dq for c = 277 or in binary c = 0100010101. Note that the longest subsequence of 0’s has a length of d4 - d3 - 1 = 3. If there are no 0’s between dl and d2, we let dq = d3 = d l . If there exists more than one subsequence of 0’s with longest length, we choose the rightmost of these subsequences. This choice is arbitrary, and we will prove later that it does not affect the result.

Furthermore, if the processor address p is between di and d2, dl 5 p 5 d z , then we denote by ds , dg, (d7, da) the bit positions which limit the longest subsequence of 0’s between dl and p (p and d2). In the previous example, if p = 5, then ds = 2, and d6 = 4 while d7 = 5 and ds = 8. The two subsequences have lengths of 1

Page 3: Optimal routing algorithm and the diameter of the cube-connected cycles

1174 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 4, NO. 10, OCTOBER 1993

11 12

Fig. 3. Base cycle representation.

and 2, respectively. If there are no 0’s between dl and p 0, and d2), we let ds = d5 = dl (ds = di = p). If there exists more than one subsequence of 0’s with longest length, we choose the rightmost of these subsequences. This choice is arbitrary, and we will prove later that it does not affect the result.

The representation we adopt for a cycle is shown in Fig. 3. In this representation the position numbers increase in the clockwise direction from 0 to d - 1. This representation is valid both for a physical cycle of a CCC(d,d) in which case position i represents processor i and for the base cycle representation which will be defined formally later.

We start the construction of the algorithm, by considering the number of hypercube edges in a shortest path from the source node to the destination node.

Lemma I : The number of hypercube edges l h in a shortest path(c,p) is equal to IcI, i.e., the number of bits with a value of 1 in the binary representation of c. The dimensions of these edges correspond to the hypercube dimensions specified by the bit positions with a value of 1.

Proof: We note first that the number of hypercube edges in any path from (0,O) to (e , p ) is at least 1cI. This can be seen by considering the binary representation of c which has IcI bit positions with a value of 1, and the fact that to change the value of a bit in the address of a cycle a hypercube edge corresponding to the dimension of the bit position must be traversed.

We remark that any path from node 0 to node c of a hypercube has to traverse the hypercube dimensions specified by the bit positions with a value of 1 in the binary representation of c. Similarly, any path from cycle 0 to cycle c of a CCC(d, d) must traverse the hypercube dimensions specified by the bit positions with a value of 1 in the binary representation of c.

We prove that a shortest path has exactly IcI hypercube edges, by contradiction. Assuming that there exists a shortest path(c, p) which traverses more hypercube edges than [cl, we can always build a shorter path(c, p) by deleting a pair of hypercube edges, thus resulting in a contradiction. The details of this proof can be found in [6]. 0

Since the number of hypercube edges to be traversed by a shortest path is fixed, to find this path we only need to minimize the number of cycle edges. Accordingly, we change our representation of the path by deleting the information on the hypercube edges traversed. We represent the path as cycle on which the positions where a hypercube edge must be traversed are denoted by a “1” and all other positions are denoted by a “0.” If we consider the sequence of “1”s and “0”s as a binary number, we will get the address of the destination cycle. We will denote this representation, as the base cycle representation (bcr).

In Fig. 3 we show a bcr in a CCC(24,24). It is a one-dimensional representation of a path from node (0,O) to node (16453639,21). The binary representation of 16453639 is 11 11 101 100010000000001 11, which also corresponds to the sequence of 1’s and 0’s on the bcr.

Since a bit position with a value of 1 on the bcr corresponds to a hypercube dimension to be crossed by the path, a path on the bcr is a valid path, only if it passes through each bit position with a value of 1 at least once. The length of this path is equal to the number of cycle edges crossed by the actual path. We summarize these observations in the following lemma.

Lemma 2: The problem of finding the shortest path in a CCC(d, d) is equivalent to that of finding a shortest path from position 0 to position p on the bcr which passes through all bit positions with a value of 1.

Hence we will denote by path(c,p) the following sequence of positions:

P O , P l , . . . P l c

where p o = 0 and pl, = p .

path is For the example of Fig. 3 considered earlier, the corresponding

0,1,2,1,0,23,22,21,20,19,18.17,16.15,14,13,12,13,14,15, 1 6 , 1 7 . 1 8 . 1 9 , 2 0 , 2 1 .

The length of the path on the bcr is 25, and this corresponds to the number of cycle edges crossed by the path(16453639,21). The number of hypercube edges crossed by this path is equal to the number of bit positions with a value of 1 in the binary representation of 16453639, or equivalently the number of positions with a value of 1 on the bcr. In our example this is equal to 11. The total length of the path(16453639,21) depicted in Fig. 3 is therefore 36.

We will denote a position where a reversal of direction occurs as an inflection position or an inflection point. In Fig. 3 positions 2 and 12 are inflection points. In order to characterize the shortest path, we introduce the following two lemmas.

Lemma 3: Consider a position k on a bcr. If a shortest path contains position k as an inflection point, then this path does not pass through this position a second time.

Proof: Let Po = O,pl,...p,-l,p, = ~ , P ~ + ~ , P ~ + Z , . . . , P ~ , = P be a shortest path satisfying the conditions in Lemma 2. Assume that k occurs at pt as an inflection point. Consequently p,-l = p,+l = k + E mod d where E = f l . Assume by contradiction that k also occurs elsewhere on this path. Let j be the position of occurrence of k such that k does not occur between i and j . We consider the case where j > i. The case where j < i can be handled similarly. We consider specifically the segment of path between i - 1 and j ; its length is j - i + 1. We will replace this segment by a shorter one, passing through all positions passed by the original segment at least once, thus proving a contradiction. The following segment

pt-i.pt+2,. . . ,pJ- i l k

is possible since p,-l = P , + ~ is adjacent to pt+2. It has a length j - 1 - 1, which is less than the original segment. It passes through all positions which the original segment passed through, it starts at p,-l and ends at 1, and hence can replace the original segment in the given path, producing a shorter path; thus a contradiction occurs. 0

Lemma 4: Consider a position k on a bcr. Any shortest path passes U

The proof of this lemma is similar in spirit to the proof of the previous one. A detailed proof can be found in [6].

We conclude from the previous lemma that in order to find a shortest path, we need to consider only paths which pass through a given position at most twice. This condition limits to two the number

through position k at most twice.

Page 4: Optimal routing algorithm and the diameter of the cube-connected cycles

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 4, NO. 10, OCTOBER 1993 1175

P P P P

0 type a

P P P P

0 type c’

0 typed’

Fig. 4. Types of possible shortest paths.

of inflection points that a shortest path can have as described in the following lemma. In order to explain our idea, first we classify the inflection points into two classes, left inflection points and right inflection points. If the direction of the path before an inflection point is clockwise, we call that inflection point a left inflection point. Otherwise, it would be a right inflection point. We immediately note that any two consecutive inflection points on a path must be of different classes.

Lemma 5: A shortest path can have at most two inflection points. 0

The proof of this lemma is similar in spirit to the proof of Lemma 3. A detailed proof can be found in [6].

In Fig. 4 we show eight types of paths that satisfy the above conditions. In these figures, and the subsequent discussion, s and y refer to inflection points and their relative position is as shown in these figures. It can be shown by an exhaustive search that these are the only types that satisfy these conditions.

Lemma 6: Any path from position 0 to position p satisfying the conditions of Lemmas 3, 4, and 5 must be one of the eight types

A formal proof of this lemma is rather tedious and can be found in [6].

Lemma 7: If c = 0 or c = 1, the only possible paths are of types a and a’ of Fig. 4.

Proofi If c = 0, the destination processor is in the same cycle as the origin. The number of hypercube edges to be traversed is 0. For a given processor address p, a path of type a is shorter than paths of type b, e’, and d’; and a path of type a’ is shorter than paths of types b’, c, and d. Hence, the shortest path is either of type a or of type a’.

If c = 1, from Lemma 1 the number of hypercube edges to be traversed is 1 and it corresponds to dimension 0. Consequently the first edge on the path is the hypercube edge between (0,O) and (l,O), and the remainder of the path from (1,O) to (1,p) lies completely in cycle 1. The problem of finding the shortest path between (1,O) and (1,p) lying completely in cycle 1 is equivalent to the problem discussed in the first part of this lemma; hence the shortest path is

0 Since the cases where c = 0 and c = 1 have been taken care

of, in the subsequent lemmas we assume that 3 . 0 < i 5 d - 1 3 bin(c,i) = 1.

shown in Fig. 4. 0

either of type a or of type a’.

Lemma 8: If p > d2, the only possible paths are of types a, b‘, c, and d of Fig. 4. Moreover, for a path of type b‘, z must be equal to d l ; for a path of type c, c must be equal to d z ; and for a path of type d, s must be equal to d3 and y to d4.

Proof: We will first eliminate the remaining types as possible shortest paths, and then we will reduce the candidates for each possible type to the ones described in the lemma.

Type a‘: The hypothesis implies that 3i,O < i < p 3 bzn(c,i) = 1. A path of type a’ does not pass through position i , and hence cannot be a valid path.

Types b, c‘, and d‘: Since - 7 3 , p 5 i 5 d - 1 3 h ( c , i ) = 1, we can reduce a path of type b, e‘, or d’ into a path of type a, obtaining a shorter path. Hence a path of type b, e’, or d’ cannot be a shortest path.

We now consider the types which are possible, and eliminate options which would not give a shortest path. Given the processor address p, there exists only one path of type a and its length is p.

A path of type b’ can be a valid path only if T 5 d l since otherwise it would miss a bit position with a value of 1. The length of such a path is d - 2x + p; hence among all the valid paths of type b’ the shortest will always be the one with T = d l and length d - 2d l + p.

A path of type c can be a valid path only if s 2 d~ since otherwise it would miss a bit position with a value of 1. The length of such a path is 2x + d - p; hence among all the valid paths of type c the shortest will always be the one with s = dz and length 2d2 + d - p.

A path of type d can be a valid path only if all the bit positions between T and y have a value of 0. The length of such a path is 2x + d - 2y + p = d + p - 2(y - T ) ; hence among all the valid paths of type d the shortest path will always be one with a maximum y - I. As defined earlier d3 and ds satisfy both conditions, i.e., all bit positions between them have a value of 0 and d4 - d3 is the maximum; hence we can set T equal to d~ and y equal to dq, which results in a path of length d + p - 2(d4 - d3) .

We note that our arbitrary choice for d3 and dq among all possible candidates (in case more than one subsequence of 0’s of longest length exists) does not affect the result since only their difference matters. Also, if there are no 0’s between d l and dy, since we set dq = d3 = d l , the length of the path of type d will be d + p . Since d + p > p, d + p > d - 2dl + p and d + p > 2d2 + d - p , the path of type d will not affect the choice of the shortest path which must be one of the three other possibilities (i.e., types a, b‘, or c) . 0

Page 5: Optimal routing algorithm and the diameter of the cube-connected cycles

1176 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 4, NO. 10, OCTOBER 1993

Lemma 9: If p < d l , the only possible paths are of the types a’, b, c’ and d’ of Fig. 4. Moreover, for a path of type b , I must be equal to dz; for a path of type e’, x must be equal to d l ; and for a path of

0 The proof of this lemma is similar to the proof of the previous

one. A formal proof can be found found in [6]. Lemma 10: If dl 5 p 5 d z , the only possible paths are of type

b, b‘, d, and d’ of Fig. 4. Moreover for a path of type b, I must be equal to dz; for a path of type b‘, x must be equal to d l ; for a path of type d, x must be equal to d5 and y to dg; and for a path of type

0 The proof of this lemma is similar to the proof of Lemma 8. A

formal proof can be found in [6]. We combine all the above in the next theorem which forms the

backbone of the optimal routing algorithm. Theorem I : Let (c ,p) be the address of a node in CCC(d, d). The

shortest path from node (0,O) to node (e , p ) can be determined as follows:

i f c = O o r c = 1{

type d‘, x must be equal to d4 and y to d3.

d‘, x must be equal to d8 and y to d i .

the shortest path is the shorter of paths of type a and a’ with lengths p and d - p , respectively. i f c = 1{

the first edge crossed on the path must be the hypercube edge in the 0th dimension.

1 1 else {

let d l , d2, d 3 , d 4 , d s , ds, di and d8 be the parameters as defined earlier. i f p > dz {

the path is selected by comparing the lengths p , d - 2dl + p , 2d2 + d - p and d + p - 2(d4 - d3), of types a , b’, c and d, respectively, and choosing the one with shortest length. At the first occurrence of a bit position with a value of 1 on the path, the corresponding hypercube edge is traversed.

1 i f p < d l {

the path is selected by comparing the lengths d-p , 2d2 - p , 2d - 2dl + p and 2d - p - 2(d4 - d s ) , of types a’, b, c’ and d’, respectively, and choosing the one with shortest length. At the first occurrence of a bit position with a value of 1 on the path, the corresponding hypercube edge is traversed.

1 i f d l 5 P I dz {

the path is selected by comparing the lengths 2dz - p , d - 2dl + p , d + p - 2(ds - d 5 ) and 2d - p - 2(d8 - d i ) , of types b, b’, d and d’, respectively, and choosing the one

with shortest length. At the first occurrence of a bit position with a value of 1 on the path, the corresponding hypercube edge is traversed.

1 1

0 As remarked at the beginning of this section, if we have to find the

shortest path from node ( c l , pl) to node (c2, p 2 ) of a CCC(d, d ) , we can relabel the nodes in such a way that the node (c1,pl) is labeled (0,O). We call the new label ( c , p ) of the node ( c z , p 2 ) the relative address of (c2,pz) with respect to (c1,pl) . In view of this, we state the following corollary.

Corollary 1: The shortest path from (c1,pl) to ( c - 2 , ~ ~ ) can be found by first determining the relative address (e. p ) of (c2, p 2 ) with

respect to (cl, p 1 ) then applying Theorem 1 to (c , p ) . 0 The determination of the relative cycle address c is done by taking

the bitwise XOR of cz and c1, and then rotating the result by p l positions to the right. The relative processor address p is equal to p z - p l mod d.

For example, if (cl , p l ) and (122, p z ) are equal to (121,8) and (867,2) in a CCC(10,10), then c1 cE cz = 0001111001 cE 1101100011 = 1100011010. c is obtained by rotating 1100011010 8 positions to the right; thus c = 0001101011 = 107, while p = 2 - 8 mod 10 = 4.

By the application of Corollary 1 we get a path(c,p), where all addresses are relative with respect to ( c l , p l ) . We can find the original addresses of the intermediate nodes by performing the operation opposite to that described above to find the relative address with respect to (c1,pl) .

Formally, if (ct ,pz) is a node on path(c,p), then its absolute address can be found as follows. The cycle address is determined by rotating cz by p l positions to the left, then taking the bitwise XOR of the result and c1. The processor address is equal to p , + p l mod d.

We note by this construction, that the absolute address of a node on path(c,p) can have a bit position with a value of 1 only if the corresponding bit position of c1 @ c2 has a value of 1.

These considerations allow us to generalize Lemma 1 to the following.

Lemma 11: A shortest path from ( ~ 1 . ~ 1 ) to ( ~ 2 ~ ~ 2 ) does not contain a hypercube edge corresponding to a hypercube dimension q

We can now generalize the optimal routing algorithm to the case of a CCC(d, k) with k 2 d . We start by observing that a CCC(d, k ) is isomorphic to the subgraph of a CCC(k, 1) induced by the cycles of a CCC(k, k) with c 5 2d - 1. Consider two nodes ( c l , p 1 ) and (cz, p z ) of a CCC(d, k ) and the corresponding nodes in a CCC(k, k). Using our algorithm, we can determine the shortest path between these two nodes in CCC(k, k). We note that, since the bit positions between d and k - 1 have a value of 0 for both c1 and c2, the bit positions between d and k - 1 in the binary representation of c1 @ c2 have a value of 0. From Lemma 11, we know that none of these dimensions is traversed; hence the shortest path between ( c l , p l ) and ( ~ 2 ~ ~ 2 ) in a CCC(k, k) lies entirely in the subgraph isomorphic to a CCC(d, k).

The isomorphic image of this path in a CCC(d,k) must also be a shortest path between (cl, d l ) and (cp, d z ) in a CCC(d, k ) . This is due to the fact that if a shorter path exists in CCC(d, k) between (cl.pl) and ( c2 ,p2 ) , then the isomorphic image of this path in the subgraph of CCC(k, k) will be shorter than the previous one.

As an illustration to these ideas, consider the CCC(5,5) of Fig. 5. The subgraph in the frame in the lower left corner of that figure is isomorphic to CCC(3,5). The highlighted edges between nodes (0,l) and (7,4) form the shortest path between these two nodes in CCC(5,5), as determined by the algorithm of the previous section. All these edges are contained in the CCC(3,5) subgraph; therefore the shortest path between nodes (0,l) and (7,4) of a CCC(3,5) is isomorphic to this path.

Hence, we can use our optimal routing algorithm for a CCC(k, k) to find the shortest path in any CCC(d, k) for k 2 d + 1. The only change is that the cycle addresses have to be interpreted as k bit addresses with the most significant k - d bits equal to 0. We formalize all of the above in the next lemma.

Lemma 12: The shortest path between two nodes (c1 ,p l ) and ( ~ 2 . ~ 2 ) of a CCC(d,k) is isomorphic to the shortest path between these two nodes in a CCC(k ,k) The determination of the shortest path can be accomplished by using the shortest path algorithm in a

The number of edges in a shortest path between any two nodes is bounded by the network diameter. As proven in the next section

such that bzn(c1 C Z , ~ ) = 0. 0

CCC(k, k). 0

Page 6: Optimal routing algorithm and the diameter of the cube-connected cycles

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 4, NO. 10, OCTOBER 1993 1177

...,.,.,.. # .... ##..## .... ##..#.#.# .... .# ..... 1

Fig. S. A CCC(5,S) and the isomorphic image of a CCC(3,5).

the diameter is O ( k ) . An interesting issue is the complexity of the procedure to determine the shortest path between two nodes in a multiprocessor. Henceforth we address briefly that issue. A detailed analysis of the routing algorithm can be found in [6]. The first step of the algorithm is the determination of the relative address (c , p ) between the two nodes. The determination of the relative cycle address e is done by taking the bitwise XOR of e2 and cl, and then rotating the result by p l positions to the right. This step can be accomplished in a time complexity of order k. The determination of the various parameters d l , d2. d3. d4. ds. d6. d7. ds can be done by scanning the relative address at most twice; hence this step can be accomplished also in a time complexity of order k . It remains to make a constant number of comparisons; consequently the overall procedure to determine the shortest path can be accomplished in a time complexity of order k. In the general case since S = k x 2 d , the complexity of the algorithm is O(LV). In the more common cases where k = d or k = 2‘, where T is the least integer such that 2‘ 2 d, the order of complexity reduces to O(1og .V).

Iv. NETWORK DIAMETER OF THE CUBE-CONNECTED CYCLES

We can use the optimal routing algorithm developed in the previous section to derive an exact expression for the diameter of a CCC(d. k ) .

This derivation has the nature of an exhaustive search; we effectively consider all possible cases, and determine the longest of the shortest paths.

We first introduce the notation used, then introduce a technical lemma, and finally state and formally prove our results about the diameter of a CCC.

We consider two nodes ( e 1 , p l ) and ( ~ 2 . ~ 2 ) of a CCC(d, k ) . In view of Lemma 12 we interpret e1 and e2 as k bit numbers with the most significant k - d bits having a value of 0. We denote by ( c , p ) the relative address of ( ~ 2 . ~ 2 ) with respect to (e1,pl).

We first note that the longest of the shortest paths must occur between two cycles c l , and e2, which are at opposite ends of a CCC(d, k); more formally c1 3 cg = 2 d - 1. This observation is based on the following two points. First, from Lemma 1, l h = IcI; hence l h is maximum when I C \ is maximum; since IcI = (c l CE c21,

in the case under consideration, 5 d, and the maximum occurs when e1 & e2 = 2 d - 1. Second, the whole purpose of the algorithm established in the previous section is to minimize 1, by avoiding as much as possible the inclusion of bit positions with a value of 0 in the bcr. Consequently, if c and e’ are such that all bit positions which have a value of 1 in e also have a value of 1 in e’, the number of cycle edges in the path from 0 to e’ is more than or equal to the

Page 7: Optimal routing algorithm and the diameter of the cube-connected cycles

1178 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 4, NO. 10, OCTOBER 1993

number of cycle edges in the path from 0 to c. Since 2d - 1 has the most l ’ s , the longest of I , must occur for that c = 2d - 1.

Since the cycles in a CCC(k, d ) are symmetric, for simplicity we choose c1 = 0 and c2 = 2d - 1 . Hence, to find the diameter we have to find the maximum of the shortest paths between (O,pl) and (2d - l , pn) , for 0 5 pl 5 k - 1 and 0 5 p z 5 IC - 1.

Lemma 13: The path found by the optimal routing algorithm finds a path with

l , < d + - - 2 i f k = d > 3

Z c < d + [ i J - l i f d + l < k < Z d - 2

13 l c < k i f k 2 2 d - 1 .

U The proof of this lemma is exhaustive in nature. We consider all

possible cases and look at the maximum of shortest paths. A detailed proof of this lemma can be found in [6]. After these preliminaries, we state and prove the following theorem on the diameter of a CCC.

Theorem 2: The network diameter of a CCC(d, k ) , d 2 3 is

6 i f k = d = 3

2 d + - 2 i f k = d > 3

2 d + - 1 i f d + l < k < 2 d - 2

d + k i f k 2 2 d - 1 .

Proof: The diameter for the case where d = IC = 3 can be determined by an inspection of Fig. 1 . For the case where d = IC > 3 , from Lemma 13 the diameter of a CCC is less than or equal to 2d + Ld/2] - 2. However the distance from node (0,O) to (2d - 1, L d / 2 ] ) is exactly 2d + Ld/2] - 2; hence the diameter is equal to this quantity.

For the case d + 1 5 k 5 2d - 2, from Lemma 13 the diameter must be less than or equal to 2d + Lk/2] - 1 . However the distance from node (0, d - rk /21) to node (2d - 1, k - 1) is 2d + Lk/2J - 1; hence the diameter is equal to this quantity.

In the case where k 2 2d - 1 , from Lemma 13 the diameter must be less than or equal to d + k . However the distance from node (0, [ k / 2 1 ) to (2d - 1, rk /21) is k + d ; hence the diameter is equal

We now consider the special case of a CCC(d, IC), where k = 2‘ with r being the least integer such that 2‘ 2 d , and derive an expression for its diameter. The best bound on the diameter of such a CCC found in the literature is 6 log N derived by A. M. Schwartz and M . C . Loui in [9]. We have the exact value which is about half of this bound.

In the case under consideration, the total number of nodes in the CCC is a power of two, namely N = 2‘ x 2d = 2”, where n = d + r . The diameter for this case can be derived by substituting d = n - T

and k = 2‘ in the expressions of Theorem 2. Lemma 14: The diameter of a CCC(n - r ,2‘) , where T is the

least integer such that r + 2‘ 2 n is

to this quantity. 0

3 x ( n - r )

2 x ( n - r ) + 2 ‘ - ’ - I 5 - x ( n - r ) - 2 2

if 2’ = 2n - 2r

i f n - r + 15 2‘ 5 2n - 2r - 2

i f 2 ’ = n - r .

0 Since n - r + l 5 2‘ 5 2 n - 2 r - 2 , we have that L(n - r ) / 2 ] + 1 5

5 n - r - 1. Consequently, in all the cases considered in 27-1

Lemma 14 the diameter is less than or equal to 3 x ( n - T ) and since n = log N , the diameter is bounded by 3 log N .

V. SUMMARY In summary, in this paper we have presented for the first time an

optimal routing algorithm for the general CCC. All previous work on this topic only provide nonoptimal algorithms valid for restricted subclasses of the CCC. We started first by constructing the routing algorithm for the special case of a CCC(d, d ) , then we extended it to the general case of a CCC(d, k ) . Based on this optimal algorithm we have derived an exact expression for the diameter of a CCC. It has been shown that the diameter derived in this paper is smaller than the previously known bounds. An extension of this work would be the analysis of random permutation routing on the CCC.

REFERENCES

J. Bruck, R. Cypher, and C.-T. Ho, “On the construction of fault-tolerant cube-connected cycles networks,” in Proc. I990 Int. Conf. Parallel Processing, vol. I, 1990, pp. 692-693. D. A. Carlson and B. Sugla, “Adapting shuffle-exchange like parallel processing organizations to work as systolic arrays,’’ Parallel Comput, vol. 11, pp. 93-106, 1989. B. N. Jain and S. K. Tripathi, “Equivalence between cube-connected cycles networks and circular shuffle networks,” in Proc. 1986 Inf. Conf. Parallel Processing, 1986, pp. 8-11. P. Mazumder, “Evaluation of three interconnection networks for CMOS VLSI implementation,” in Proc. I986 Int. Conf. Parallel Processing,

-, “Evaluation of on-chip static interconnection networks,” IEEE Trans. Compuf., vol. C-36, no 3, pp. 365-369, 1987. D. Meliksetian and C. Y. R. Chen, “Optimal routing algorithm and other communication issues of the cube-connected cycles,” Tech. Rep. #TR 90-4, Dep. Elec. & Comput. Eng., Syracuse Univ., Apr. 1990. F. P. Preparata and J. Vuillemin, “The cube-connected cycles: A versatile network for parallel computations,” Commun. ACM, vol. 25,

A. G. Ranade and S. L. Johnson, “The communication efficiency of meshes, boolean cubes and cube-connected cycles for wafer scale integration,” in Proc. 1987 Int. Conf. Parallel Processing, 1987, pp. 479-482. A. M. Schwartz and M. C. Loui “Dictionary machines on cube-class networks,” IEEE Trans. Compuf., vol. C-36, no. 1, pp. 100-105, 1987. N.-F. Tzeng, S. Battacharya, and P.-J. Chuang, “Fault-tolerant cube- connected cycles structures through dimensional substitution,” in Proc. 1990 Inf. Conf Parallel Processing, vol. I, 1990, pp. 433-440. L. D. Wittie, “Communication structures for large networks of mi- crocomputers,” IEEE Trans. Comput., vol. (2-30, no. 4, pp 264-273, 1981.

1986, pp. 200-207.

pp. 300-309, 1981.