11
A Comparative Study of X-Tree, Pyramid and Related Machines Department of Electrical Engineering and Computer Sciem The Johns Hopkits University Baltimore, Maryland 21218 Introduction t the most suitable, modular and the worst-case com- iAK82, NS80, TK77, high susceptibility of to investigate other networks ISDP78, ORS82, St831, the 3, St82, St83, Uh721 and the , Lei81, CS811 ax. some such trees has been considered exten- ei81, Lei831 and by Nath et. al. ourselves to X-tree and pyramid machines, we develop optimal ing certain functions (such as addition) and for solving some s (such as sorting, merging, and a few graph problems). The times for these problems are listed in table 1 bounds are shown to be optimal, ibution of this paper is to technique for obtaining nd pyramid machines. In techniques have either f the network (as in the mesh) or have relied on the sequences across the wires of the M T. J. Watson Center, Yorktown bisection width [Th79, Ja81, Ya801. Formerly, the number of different crossing sequences over these wires has been obtained by considering an alternate configuration in which two RAMS are connected to each other through a single communication link and then calculating the number of crossing sequences for this configuration. (For a detailed discussion, see Aho et. ai’s paper [AUY83].) Both techniques have a serious drawback : they do not take into account any but the simplest aspects of the network topology (like the diameter, the bisection width etc.). Fortunately, these techniques work well for most known parallel networks (like the shuffle exchange network, the Cube Con- nected Cycles, meshes etc.) but they yield only trivial bounds for the X-tree and the pyramid machines. Our lower bound technique incorporates the network topol- ogy and yields non-trivial bounds for these networks. However, it works only for conservative flow algo- rithms (described later), and its generalization for the most general algorithms remains unresolved. Models of various networks are described in section & our main contributions are summarized below. For the sake of brevity, we list most propositions and theorems without proofs; the proofs for remaining propositions will appear in the final version of the paper. [ l l Assuming that a processor in these machines can store a constant number of words (of 0 (logn 1 bits each), we develop O(n/logn) time algorithms for computing some transitive functions and also for sorting and merging on an X-tree machine. For d 2 2, we also develop algorithms for a d-dimensional pyramid computer for solving these problems. In section 11, we consider the problem of solving the k-th largest element in a set of n numbers, whew the index k can vary between 1 and n and is provided as an input also. Fredrick- son [Fr831 has shown that the k-th largest element can be selected in 0 (log3n) time on a binary tree machine and Stout [St841 has recently improved this to O(log2%). We show that if the numbers are integers that are taken from a field of size Q(n’*) for some constan1 E z= 0, then the k-th largest element can be selected in o <log2n time. 0272-5428~84~0000~~0089$01.00 @ 1984 IEEE 89

A Comparative Study Of X-Tree, Pyramid And Related Machines

  • Upload
    lybao

  • View
    223

  • Download
    0

Embed Size (px)

Citation preview

Page 1: A Comparative Study Of X-Tree, Pyramid And Related Machines

A Comparative Study of X-Tree, Pyramid and Related Machines

Department of Electrical Engineering and Computer Sciem The Johns Hopkits University

Baltimore, Maryland 21218

Introduction

t the most suitable, modular and

the worst-case com- iAK82, NS80, TK77, high susceptibility of

to investigate other networks

ISDP78, ORS82, St831, the 3, St82, St83, Uh721 and the , Lei81, CS811 ax. some such

trees has been considered exten- ei81, Lei831 and by Nath et. al. ourselves to X-tree and pyramid machines, we develop optimal ing certain functions (such as addition) and for solving some

s (such as sorting, merging, and a few graph problems). The

times for these problems are listed in table 1

bounds are shown to be optimal, ibution of this paper is to

technique for obtaining nd pyramid machines. In

techniques have either f the network (as in the

mesh) or have relied on the sequences across the wires of the

M T. J. Watson Center, Yorktown

bisection width [Th79, Ja81, Ya801. Formerly, the number of different crossing sequences over these wires has been obtained by considering an alternate

configuration in which two RAMS are connected to each other through a single communication link and then calculating the number of crossing sequences for this configuration. (For a detailed discussion, see Aho et. ai’s paper [AUY83].) Both techniques have a serious drawback : they do not take into account any but the simplest aspects of the network topology (like the diameter, the bisection width etc.). Fortunately, these techniques work well for most known parallel networks (like the shuffle exchange network, the Cube Con- nected Cycles, meshes etc.) but they yield only trivial bounds for the X-tree and the pyramid machines. Our lower bound technique incorporates the network topol- ogy and yields non-trivial bounds for these networks. However, it works only for conservative flow algo- rithms (described later), and its generalization for the most general algorithms remains unresolved. Models of various networks are described in section & our main contributions are summarized below. For the sake of brevity, we list most propositions and theorems without proofs; the proofs for remaining propositions will appear in the final version of the paper. [ l l Assuming that a processor in these machines can

store a constant number of words (of 0 (logn 1 bits each), we develop O(n/logn) time algorithms for computing some transitive functions and also for sorting and merging on an X-tree machine. For d 2 2, we also develop algorithms for a d-dimensional pyramid computer for solving these problems. In section 11, we consider the problem of solving the k-th largest element in a set of n numbers, whew the index k can vary between 1 and n and is provided as an input also. Fredrick- son [Fr831 has shown that the k-th largest element can be selected in 0 (log3n) time on a binary tree machine and Stout [St841 has recently improved this to O(log2%). We show that if the numbers are integers that are taken from a field of size Q(n’*) for some constan1 E z= 0, then the k-th largest element can be selected in o <log2n time.

0272-5428~84~0000~~0089$01.00 @ 1984 IEEE 89

Page 2: A Comparative Study Of X-Tree, Pyramid And Related Machines

[2] The separators of an X-tree ard d-dimensional pyramid machines of size S are @fogW and f3GT(d-1)'d) respectively. Using this and a variation of the information flow technique [BK81,Th791, it can be shown that any S-node, X-tree machine requires M n / b g S ; tog) worstsase time to com- pute any transitive function of order n. Conse- quently, if S equals a bounded polynomial of n, then the X-uee machine requines nhllogn) womtcaSe time. However, the information flow technique yidds a time bound of o(&) only for the computation of a transitive function on an arbitrarily large X-tree machine. [See remarks in Appendix I and those following theorem 2.3 also.] Though demonstrating optimal bounds for the mast general kinds of algorithms remains open, we establish the optimality of the previously men- tioned upper bound, in section 111, for conserva- tive flow algorithms. An algorithm has conserva- tive flow if, when impkmented on a parallel

Irchiwnne, it Mt or d e d inputs, but rather the atamirc Caements of the inputs only. A processing element (PE) in this mhitectum may modify data for its own use. All known parallel algorithms that perform sorting, merging and permutations on various networks have conservative flow when the words am taken from an O(nl*) size field with c > 0. We show that any conservative flow algorithm which com- putes a transitive fumtian of order n takes n(n/logn) worstsase time even on an o(2J;;)- sized X-Uee machine. This 1- b d techniw

X-tna,

transitive @(n) Q(n/ logn) functions Followa from COM. Flow

[!3r70,Ma78]

k-thlargent O(Log2%) O(Logz%) [stw Follows

O(LogZn) from [st841 this paper O(Log2n)

this paper

addition @ ( l o p ) B(Zogn)

Connected o ( n l o g % ) O ( n l o g z n )

and Mm. Cons. Flow Components R(nlogn) N n )

spanning

Forest I I

is described in section 111. It is also shown to yield optimal bounds [within a multiplicative constant] for a transitive function on a d-dimensional pyramid machine. In contrast, the information flow technique yields nonaptimal bounds for an 0(nd'@-'))-sized ddimensional pyramid machines. In addition, the technique yields close bounds for graph problems, whereas the information flow technique yields trivial bounds. (%e sections 111 and rv).

[31 In section IV, we consider the problem of addition of two n-bit numbers and two graph problems when every processor in these machines is a finite state automaton. It is shown that an n-node binary tree machine can add two n-bit numbers in O(lqpn) time. Chazelle et. al. [CM811 have shown that if the input bits fed at the periphery of a mesh automata, then it takes a(&') time to aid these numbers. We strengthen their result by demonstrating an a(&) bwnd for addition even if the bits are fed anywhere in the two-dimensional mesh and also generalize this result to a d - dimensional mesh. Note that the input bits for addition can be pipelined and hence an n(&) time bound is not straightforward. In fact, com- puting the logical OR of n bits on a twe dimensional mesh requines only @(n'/3) time.

[4] The problems of finding the connected com- ponents and the minimum spanning forest of a graph of n vertices are also consided in section IV. We assume that the input is provided as an dscency matrix and develop an O((lprugn)v2)

conr Flow

0 (tog 2% )

O(L0gZn)

Follom

for a 2-PC

n( (nlogn ) '1 ' ) for B d-PC Cons. Flow

Pu:

90

Page 3: A Comparative Study Of X-Tree, Pyramid And Related Machines

oped for the binary tlee machine and these are shown optimal bounds if the conservative or Co&ctunz 3.6 hdds.

se machines from

results are s l m l m a m d ' i n in section VI with some

binary tree two-dimensional machine mesh

Binary I @(I) 0(Js / l o g s ) I I Anding minimum I

I. Machine Models

at levlel (Ar-1) that - . J,), when! 4 is either U, or for thc plmxsmm atom the peri-

phery, each PE is comted to (r' + 2d + l) other pro- cessors. Figures l(a) and l(b) show a 1-PC and a 2-PC respectively and these have been referred to previously

this paper, we assume that d is at least two for a d - dimensional pyramid machine (d-PC).

We assume that the unit of information in sections I1 and 111 is a word whereas that in sections IV and V is a bit. If the unit is a word and if the machine com- putes a function of n variables then a variable can assume a value from z - IO, . . - , L-1) for some L 3 nl* with c > 0. A PE consists of a constant number of registers and a processing unit that performs simple arithmetic and comparison operatbm in unit time. If the unit is a bit then a PE is assumed to con- sist of a finite state automaton and bits are exchanged amongst adjacent PES. In either case, the delay along a communication link and its capacity am taken to be unity. For these machines, the three important IP;SOUIC~S am : S - the number of PES in the machinq A - the m a of the minimum bounding nxtangle when the mechine is embedded in a VU1 grid; and T - the worst- time intend between the feeding of the first input ard the emergence of the last output for the mOSt time consuming data instance. Note that, in general, we assume that the number of inputs, R , dflers appre- ciably from S, the number of PES of the machine.

Some PES BTP; designated as input/oucput ports and these aze used to communicate with the external world. Ewry input is read once. The inputs and outputs must be delivered awoditlg to B predetermined sequence of time instants and at p~especified locations that do not depend on any data instance.

II. Algorithms f a Transitive Functions, Sorting n d

89 the X - ~ e e and the pyramid machine. Throughout

selection

(a) Smtitg nnd Trmitiw Functions be merlged

using &tchrx's oddeven merge algorithm. The odd- eveJl mqp sigorithm can, in t u n be executed ~ i n s t m repetitions of a DESCI?M) algorithm which Wm tnaoducsd by Prepram d Vuillantn [?vbll Simi- larly, the Fast R w h r Trwsfarm, permuutb8, cunvo- lution and matrix transmition can also be executed as a sequence of three algorithms that belong to the Des- cend ctars. Using Thompson et. al.'s arguments [TK771, we can execute an algorithm of the Descend Class on a d-IvlCC of size md (md - n ) in 0 ( n ' l d ) time steps. Consequently, we can concentrate on knple- menting a Bscend algorithm on an X-Uee only. In [&841, a strategy -- OP-X-Tree -- is demonstrated and this is employed to implement a DesceIyi algorithm on an n-leaf X - r n in O(n/bgn) time steps. Thus, we ha= the following theorems :

Recall that two lists of ( n / 2 ) WO@

Thcanar 2.1 : Le( a be one of the f d l m transitiva fuIlctkw : Fast Fourier T d o r m , cydk shift, permu- tation, convolution, matrix transposition The$ P can

I 91

Page 4: A Comparative Study Of X-Tree, Pyramid And Related Machines

be computed on a d-PC in 0 ( n time step. Simi- larly, two lists of ( n / 2 ) words can be merged on a d-PC in o (n % time steps.

Theorem 2.2 : Let Q be one of the functions given in theorem 2.1. Then, Q can be computed on an X-tree machine in O(n/[ogn) time steps. Similarly, two lists of (n /2 ) words can be merged on an X-tree machine in 0 (n /lorn ) time steps.

Theorem 2.3 : n words can be sorted on an n-leaf X- tree machine [d-dimensional pyramid machine] in ~ ( n / l o g n ) [in 0 ( n 9 , resp.1 time steps.

Theorems 2.3 follows from theorems 2.1 and 2.2 when a simple divide and conquer merge sort algorithm is executed on an X-tree machine and on a d - dimensional pyramid computer. For the X-tree machine, we use the two sons of the apex to sort ( n / 2 ) words in parallel and then m r g e the resulting lists to obtain a complete sorted list. Consequently,

easily obtained using theorems 2.1 and 2.2. In [AgM], it is shown that the "reversal permuta-

tion" can be executed on an o (2J;;) sized X-W in O ( 6 ) time steps. This led us to believe that it might be possible to merge two lists in O ( 6 ) time steps on some large X-tree machine. However, in section 111, we show that this is not the case - at least when conser- vative flow algorithms are executed on X-tree machines.

(b) Selecting the k-th larg~st element on a bincuy ire machine

If k equals one (or n) then the largest (or the smallest, resp.) item can be found in 0 (iogn) time step on a binary tree machine. Also, the largest, the second largest, ... , the k-th largest can be suitably determined in 0 (logn + k) time steps by pipelining the operations. This motivated Tanimoto [Ta751 to give an O(logn + k) time algorithm for finding the k-th l a r m t (or the k-th smallest) element in which the apex continuously dis- cards the largest (or the smallest) element continu- ously. Unfortunately, for large k, this algorithm may require n(n) time steps. stout [St841 h&¶ dmmnsuated M o(r0gUd algorithm for e t i n g the k-th largest element on a binary tree machine of size II.

Theorem 2.4 : The k-th largest element can be deter- mined in 0 (log2n ) time step on a binary tree machine.

W e conjecture that the bound of theomm 2.4 is optimal, within a multiplicative constant, for the binary tree machine model described in section 1. From theorem 2.4, it also follows that the k-th largest de- ment can be determined on an X-uee and a d-PC in O(rog2n) time. It remains open whether the additional e d g ~ s of the X-tree and the pyramid machines can be usecl to reduce the worst-case t i m for computing the k-th largrst element

sw - s w ~ ) + M ( ~ ) and s ( n ) = o ( n / l o g n ) can be

111. Loner Bound for Conservative Flow Algorithms Theorem 3.1 : Any conservative flow algorithm that merges two lists of (n /2 ) words on an arbitrarily large X-tree machine [arbitrarily large d-PCI takes bl(n/Zogn) [ n ( n I ' d ) ] time steps. Sketch of Proof : Let the words of the ONO input lists be indexed from 1 through ( n / 2 ) and from (n/2Hl through n , the output words be indexed from 1 through n in the ascending order and let no input port [output port] inputs [outputs} no more than (nl5001ogn) words. In lemma 3.2 we show that n(m/logm) time steps a= required by any conservative flow algorithm to transmit m words across a complete X-tree. In lemma

3.3 we show that at least 2 [ w] wods have to be

transmitted from any set of input ports PI to any set of output ports Po, where denotes the number of words that enter the ports of PI and have subscripts between 1 and (n /4 ) and 101 denotes the number of words that emerge from the ports of Po and have sub- scripts between (n/4) and (3n/4). Finally, in lemma 3.4, we show that there exist PI and Po, with (111, 101 2 (n/28)) , which are reasonably distant so that transmission of (2n/(28I2) words across some complete X-tree is required. Combining these lemmas, the given theorem can be easily established. m Before proceeding with these lemmas, some notational conventions are clarikd:

An X-tree is said to be complete if the number of nodes at any level,h, of the X-tree is 2'; the root being taken at level zero. In the following section, we d8eEntiate between complete, almost complete and incomplete X-uee~. We continue to assume that the underlying graph of an X-uee machine is a complete X-uee. An almost complete X-tree is a maximally con- nected subgraph of a complete X-tree that satisfies the following conditions : (i) The removal of the horizontal edges of this tree results in a binary tree that has at least (2('-')+1) nodes on my level h for h (ii) The removal of the horizontal edges results in two trees of ual height with one of these trees having at

1.

b t (2'" 9 +I) nodes on any lew.1 h for h 3 1.

If m almaat canpletc tree srtfs&s cotldition (i) t b n it has one mot, say x, and we denote this me as Mx); if it satisfies codition (ii) then it has a pair of nodes as mts, sayx and U , and we denote this uee as W(X,U). A complete X-tree with x as the root is denoted by @(XI and if the tree is neither complete nor almost complete, then it is incomplete and is denoted bY @"(XI or @"(x,u) depending on whether it has one root x , or two roots x and U. If a node r belongs to a x ) , then the subtree @,(x) is also complete. Also, G - H represents the graph obtained after deleting the nodes and edges of H from the complete uee G.

92

Page 5: A Comparative Study Of X-Tree, Pyramid And Related Machines

nts the shortest

right son Note any root x, @,(XI has exactly htmost edge and similarly one

node for every level h with Lr(x ) [the rightmost path composed of atl left edges

formed of a sequence of nonconsecut edges. The s u b t ~ induced

subgraph of @(XI which is that are either enclosed by the

@I [Pys,,Ly,(x),L, depending on it incomplete, re! induced by pat also be defined. conventions.

being almost complete, complete or xtively. Simildy, the subtrees

igure 2 summarizes these notational (~yl,~y~),(~yl,~y*) and (Ly,,Ry2) can

Lemma 3.2 : Le G can be partitia graph induced b tree. In G, let e\ by B be connect larly let every ri B be connected 1 words be transm C by a consen words enter a le and emerge fron minimum time e f m t word from some vertex in C

Proof : We marh their origin in 1

labeling techniqL either some nod

the dkmm

The W i Q Tec; link of G, ( w , d

from some node 1 and m. Then, ~ n : with a label i travel from t op i G' with m disth

( W , d ) @ and (ld'

5 be a graph such that the nodes of d into three sets A, B andC and the the vertex set of B is a complete X- y leftmost node of the graph induced to some other node of A and simi-

tmmt node of the graph induced by some other node of C. Finally, let m ai from the nodes of A to those of ive flow algorithm such that these nost node of the graph induced by B me of its rightmost nodes. Then, the ssed between the transmission of the and the receipt of the last word by

3 n (m/rosm 1.

he movement of these m words from to their destination in C using the discussed below. Then, we show that in B transmits n ( m / h g n ) words or raveladbys~memrrdis nhliegm).

lqur .- We rem every bidiractiolvl by a pair of opposite d k t e d am

L. Let the i-th wod be transmitted inAtosomeq, i n c f o r i between movement is marked by labeling an r the arc is used by the i-th word to 1 41. This msdts in a labeled digraph laws. Take a dkcted path from pl

93

to q, and trim all cycles and loops, if present. MO= for- m y , let pi be ((pi , ~ l ) a , ( u 1 , ~ 2 ) a 7 . . ,(um,a l )a , (u l ,~ )a , ( ~ , u l ) a ((11'4 110 ! then erase the labels { ( u , , u ~ ) ~ , - . . (um,al),) and {(ul,y)a,(y,ul)a) which form a directed cycle and a loop, respectively. It is easy to see that the number of words transmitted by any node in B equals at least half the sum of the number of labels on the am incident on the corresponding node in G'.

Let s,, denote the sum of the number of labels on the arcs incident on the vertex U in G'. And let 2k, be the maximum number of labels on the arcs incident on any node at the 5th level (i.e. k,- max Ck;)) of the complete tree. Furthermore, let J be the minimum s.t. x k 2 m . Then, it is easy to see that some word trav-

V F J - t h LeK?l

nodes of G2. Also, the &imum number of nodes transmitted by any node of B is at least max {k,]. Con- sequently, if the minimum time intend to complete transmission is T, then

i

and hence T - n(m/logm) cm be easily shown rn

Lemma 3.3 : Let N be a network that merges two lists of ( n / 2 ) words such that the output words are indexed from 0, through On-l in the ascending orcEer. Furth- ermore, for any given set of input ports P, and output ports Po, let be the number of words entering the the ports of PI with subscripts between 1 and (n/4) and let 101 be the number of wonis emerging from the ports of Po with subscripts between (n/4) and (3n/4). Then, there exists a problem instance for which at least [(2kIlOl)/nl words haw to be transmitted from PI to PO - Proof : Though, this lemma holds for nonconservative flow algorithms also, we prove it for conservative flow algorithms only. Let the words from the first and the second lists be indexed from i l through in /2 and i(@/2)t.l

through in respectively. Then, ( 4 2 ) instances of merg- ing are provided to the network so that for the k-th

emerges as the (k+j)-th output wod. Thus an input word enwing a port of p1 is output exrtly lo1 times from the ports of Po for these (n /2 ) problem instances. Consequently, at least [(2lrll0l)/nl words have to be transmitted from PI to Po for some problem instance.

instance (0 Q k Q O.Sn-l), the j t h word Of the first list

rn

Lemma 3.4 : If an X-tree merges two lists of (n/2) words then there exists a set of input ports PI and a set of' output ports Po, with kl, 3 (n/%), such that at

Page 6: A Comparative Study Of X-Tree, Pyramid And Related Machines

least (2n/(2g2)) word9 fmm PI ha= to be transmitted to Po across a complete X-tree.

: To each node J in the underlying tree, we a weight WJ equal to the number of input words

indexed between 1 and (n/4) that enter the correspond- ing processor of the X-tree machine and divide WJ by (1/4) to obtain wJ. Similarly, we assign a weight WJ' to each node j equal to the number of wods that are indexed between (n/4) and (3n/4) and emerge from the corresponding processor in the X-tree machine. Again, we divide WJ' by (1/2) to obtain wJ'. NOW, it is easy to see that them exists a subtree with some root y' that has an unprimed weight (obtained by summing the unprimed weights asso~iated to the nodes of this sub- uee) between [(n/3Hn/5Wogn)l and (2n/3). Extend- ing this observation, we note that there exist two node disjoint X-subtrees such that the unprimed weight of one and the primed weight of the other are both between [(n/9>-(2n/5Wogn)l and [2n/31. Let y l and y2 be the roots of these subtrees and let yl be to the left of yz. Furthermore, let Pylyl be the shortest path between yl and y2 and GrJ"(Py~z,Ly,,Ryl) be the X-uee induced by Pysl,L,,,,Ryz. Then, we examine the follow- ing two cases : Cuse I : Neither y1 nor y2 equal r or s. Then, both r and s are at a level higher than y1 and y2, @'rJ(Pyl&ypR,,l) is an "almost" complete tree and either both the sons of r or those of s belong to this subtree. Furthermore, if z2 is the right son of r then we consider the following two subcases that depend on whether r is the left-son or the right-son in the com- plete X-tree:

Cuse I(a) : r is a left-son in the complete X-tree. Then, r and s have the same father, say w 1 and let 4,,(x> be the left ancestral path from r to the root of this uee as shown in figure 3. Figures %a) and %b) depict the two cases when w , is the left and right son respectively. These cases aft? symmetricat md we only consider the case when w 1 is the left son. Consider the path P formed by the union of AZZrR(x) and L,,(x) and label the directed paths taken by the words input in the sub- tree with root yl to reach the output ports of the sub- tree with root y2 as in lemma 3.2. Lemma 3.3 guaran- tees that the 2numb"r of such paths is at least

[2n[+j&] 1. It is easy to see that each of these

paths intersects the path P in at leest one node and for every node mark the last" MDde where the directed

th intersects P. Let A be the nutnbr of such paths

A words ane transmitted ~tcross the complete X-tree, arZ(x). Since the number of paths intersecting P ILf

at most tn/SO(Wogn) at least ] w o e are transmitted

across the complete X-subtree with root w p Conse- quently, at least (2n/(28)2' WOKIS are transmitted BGIOSS a complete X - e with root z2 or with mot w p

Cuse I&) : r is the right son in the complete X-tree. Let P be the union of 4 , ( x ) and LZ(x) as shown in figure 4 Then, the arguments of case l(a) can also be applied here so that at least [2n/(28l2I words are transmitted across a complete X-tree with z2 or with w 2 as the mot.

Cuse 2 ; y l - r and y2 lies on the leftmost path of s. Then, the= exists y,' and yi such that the sub- [@, ,$xb@ 4x)I and QylS(y,) have unprimed weights of

at least flL2)) and similarly, the subtrees

aylb2) and [ayz(xF-@y~(x) l have at least

{ {-$-&)] primed weight. If y{ does not belong

to RyI and if y2 and y; do not belong to L,(x), then the arguments of case l(a) can be applied. Consequently, we can assume that y,' belongs to RyI; y; belongs to L,(x); y,' is at a level no less than that of y2'; and wl is the node adjacent to yl and belongs to ay.,. See figure x Note that the proof remains unchanged when the degenerate case of y2( equal to w1 occurs. Let, P be obtained as the union of path A,,(x), edge b,',wl) and Z,,lv(yl) and let w2 be the right brother of yi. Then, applying the arguments similar to those of case l(a), it can be easily shown that at least 12n/(28)21 words have to flow from [#y,(xF-@ylt(x) 1 across ayY,.(x) or BCMSS

aW2(x), thereby completing the proof.

Though theorem 3.1 demonstrates the optimality of oddeven merge algorithm (of section 10 on an X- tree and on ddimensional pyramid machines, it remains to establish sirnilat. bounds for non- conservative flow algorithms. Similarly, theorem 3.5 demonstrates the optimality of the conservative flow algorithms which compute the transitive functions of section 11. Fortunately, all knawn parallel algorithms that merge two lists of (n/2) words and haw: their input words taken from a range of size O(n1+9, with E > 0, have conservative flow. However, it is hard to justify the conservative flow assumption for the computation of some transitive functions like the Fast Fourier Transform. It is worth noting that, with some addi- tional effort, theorems 3.1 and 3.5 can be established for nonconservative flow algorithms if lemma 3.2 can be established. [See Section VI akm.1 IR fact, theorems 3.1 and 3.5 can be established for nonconservative Bow algorithmsl if

2.1. Then, any conservative flow algorithm In(n'4I time to compute a on an X-tree machine [a d-El .

27 SWOgn

tme 3.6 hdds

3.5 : Let a be any function given in theonm

94

Page 7: A Comparative Study Of X-Tree, Pyramid And Related Machines

graph such that the nodes three sets A, B andC and

vertex set of B is a complete pyramid machine]. In G, let

machines with d

Theorem 4.1 : uee machine of n leaves adds time and this is optimal,

theorem 4.2 for larger values of d

we show tha

usxi for the com

then th8m exbu a d u a

(m /2) units apart- Consequently, T 2 (m/2 ) 3 ( n , / 1 4 4 ~ ) ~ ~ and induction holds.

Cuse 2 : If m < ( t ~ J 3 f x ) ~ ~ then let tk denote the k t time when at least (3cmzk) bits have emerged from the mesh. Since at most (m2) bits can emerge at r, and since at most (m2) bits can be stonxi in the mesh auto- mata, at least (cm2) emerging bits depend on the bits input during the time interval rk and tk+l. Now, using induction and the independence of the input/output schedule of the mesh, we obtain :

f&+l r, + max[(cm2/144c)"2 ; 11 that is

f, 2 k max [m/l2 ; 11.

k t k, be the largest k such that Uarn2< no. Then, [(q,/3cm2>-11 Q k, Q In,/3cm21. SinCe m < 1n/36cIVz, k, ~(11q,/3Qnz). Now, if m <6, t b induction holds trivially. Otherwise, T 3 rko 3 (lln,/36cm2)(m/12) and using the above ine- qualities, this completes the induction step. rn

TBeorerrr 4 3 : A binary uee machine computes the minimum spanning forest (and the co-ted corn- ponents) of an n-node graph in O(d0g3n) time. An X-uee machine, a 2-PC, and a d-PC sdve them prob- lems in o(dog2n), ~ ( ( d o g n ) " ~ ) and 0((ntog2n)vd) worstase times, respectively.

Theerem 4.4 : Let Cor&xture 3.6 be true when - (41). Then, any algorithm which computes the minimum spanning f o m t (or the connected com- ponents problem) of an n-node graph on an arbitrarily large binary vee machine 1 X - m machine, d-PCI

time. The lower bound of theorem 4.1 and the upper

bound of theorem 4.2 which are straightforward are omitted for the sake of bmvity. From theorem 4.1, it follows that an X-tree and a pyramid machine can also add two n-bit numbers in B(logn) time. Theorem 4.3 holds for Conservative Flow Algorithms even if Con- jxture 3.6 is false. H m w x , as in transitive functions, it is hard to justify the conservative flow assumption for atgorithms that solve these graph problems on these machines The algorithms in theorem 4.3 rely on mov- ing data from the base of the pyramid to a suitable higher level and then using the structure of the d - dimensional mesh at that level to route this data prop erly. The lower bound in theorem 4.4 is established by using arguments similar to tho- of theorem 3.1 and by showing that the value of m (see CorzjeCture 3.6 or kmma 3.2) equals n(dogn) for both the8e graph pcob lans.

requires n(nlogn) [n(n), n((dogddld)J worstcase

95

Page 8: A Comparative Study Of X-Tree, Pyramid And Related Machines

embed a d - M a , a d-PC, amd the mesh of vees with S PES in 8(S2'd-')'d), 8(S2(d-'Xa), and @cS;lo&S') amm such that these embeddings have

respectively. Similariy, the binary tree machine of size S can be embedded in @(S) area, without any cross- overs [the H-layout, BK81, HZ821 and an X-tree machine can be laid in @(S> area. In theorem 5.1, we show that any X-tree of size S can be embedded without any crossows in a VU1 grid of SGS) ami.

Theorem 5.1 : An X-tree with S nodes can be embed- ded, without any crossovers, in a VLSI grid of @(SI area.

(b) Stepwise Sd;slotions of Variopcs M a h i m Let an X-tree machine with S processors compute

a function, f, on n variables in T time steps. Then, a binary tree machine exists that simulates the X-uee machine, stepwise, and computes f in O(Tto&') time steps. The proof of this statement is given in Appendix IV and various stepwise simulations are listed in table 11. Note that if machines M , and M2, with O(S) pro- cessors each, compute some function f in O ( T , ) and 0(T2) time steps, then M 2 cannot simulate M , in o(T2/T, ) time steps. This gives a lower bound on the time of stepwise simulation of M , by M ,. We use sort- ing of S words and finding the minimum of S words to establish the optimality of various stepwise simulations. Except for the simulation of the mesh of trees by a two dimensional mesh, these simulation times ~IE shown to be optimal and the functions for which the bounds are achieved are listed in table 11. It is worth noting that if the pipeline period IEh81, Vui801 is considered as a resource, then aU these simulation times can be shown to be optimal.

VI. Conclusion The intent of this paper was to investigate data

movement techniques for some special networks which a~ derived from the binary tree and the mesh machines. We presented optimal bounds for some problems and close bounds for others. A new lower b o d technique which incorporates the entire network topdogy was introduced. We believe that this tech- nique is quite powerful and m be exploited to yield good lower bounds for conservative flow algorithms on other networks. However, it seems to be diacult to generalize it for nonconservative flow algorithms. Though we have obtained dose bounds, the following

[l] Prove [or disprove] con.btune 3.6. COnkCtW 3.6 seems to be the key in establishing optimal bouniS for non-conservative flow algorithms on X-tree and pyramid mach

@(S"+')'d 1, @(S'2'+-')'d 1, and @(Slog5 ) Crossin@,

ple cuts" and show that either the number of cross- ing sequences across some "cut" is large or some words travels through many such cuts. [bl Estab- lish that in any transmission of m words, there are at least rn disjoint paths that are used by these words to reach their destination. Then, the labeling tech- nique can be used appropriately.

[2] We presented two models of computation - the word model and the bit model - and established bounds for these models. Undoubtedly, the bit model is more appropriate for VLSI theory and the word model was introduced primarily to illustrate our lower bound technique. Most of the given bounds can be easily extended for the bit model and some of these can be shown to be optimal for conservative flow algorithms. However, obtaining tight bounds for computing the cyclic shift of n bits on an X-uee [or a d-PCI machine where the PES are assumed to finite state automata remains open. It is possible that-the idea of using Clerks, in parallel processing ISt831, may yield optimal bounds.

[31 Obtain optimal bounds for finding the k-th largest element on the binary tree machine, the X-tree machine, and the d dimensional pyramid machine. We conjecture that O(log2n) is a tight bound for these machine models. However, proving (or disproving) this remains open.

[41 Binary tree machines have small diameters and are used extensively in performing dictionary opera- tions [AK84, ORS82, EK79, Lei791. Conse- quently, it is useful to have a machine that has a complete binary tree as one of its spanning tnxs. In view of this, we consider the following problem

Let the underlying graph of a parallel machine be planar and let one of its spanning trees be a com- plete binary tree. Determine the minimum worst- case time for sorting n wods on any such machine. We conjxture that O(n/logn) is an optimal bound on time for any such machine; proving (or disproving) this conjecture also mains open.

[SI Obtain optimal bounds for the graph problems considered in section IV. We believe that the upper bounds can be improved and that some elegant data movement techniques can help to achieve these bounds.

Acknowledgements : The author thanks Quentin Stout for drawing the author's attention to the literatw on X-uee and pyramid machines, In ISt831, Stout demon- strates an algorithm to sort n words on an X-tree machine in O(n/logn) time. However, his algorithm neither extends for transitive functions nor does he prove the optimality of the sorting algorithm for large X-tree machines. The author is grateful S . Rao KosararjU for his advice a d for furding this research

96

Page 9: A Comparative Study Of X-Tree, Pyramid And Related Machines

MCS 820-5167, and to Joseph O'Domll for carefully d i n g

T h e X-tree machine is more tree machine," Tech. Report,

. Ullman and M. Yannakakis, Transfer in VU1 Circuits," Symp. on Theory of Com-

the area required by VLSI Conf. on VU1 systems and

. T. Kung, 3. Sproull and G.

puting, pp. 133-140.

and H. T. Kung, The area-time multiplication," JACM, 28, 3, July

1981, pp. 521-534.

L. Monier, "A Model of Com- Related Complexity Results,"

mposium on Theory of Comput-

quadtree Machine for Parallel

kson, "Distributed Algorithms for

[Jo80] R. B. Johnson, The Complexity of a VLSI Adder," Info. Proc. Letters, Vol. 11, No. 2, Oct. 1%). [Lei791 C. E. Leiserson, "Systolic Priority Queues," Dept. of Comp. Science, Carnegie Mellon Univ., Tech. Report CMU€S-70-115, 1979.

[Le801 C. E. Lei#rsan, "heiffcbnt graph layouts (for VLSI)," Proc. of the 21st Annual IEEE Symp. on Foundations of Comp. Science, Oct. 1980, pp. 270-281.

[Lei811 F. T. Leighton, "New Lower bound techniques for VLSI," Proc. of the 22nd Annual IEEE Symposium on Foundations of Comp. Science, Oct. 1981, pp. 1-12.

[Lei831 F. T. Leighton, "Parallel Computation Using Meshes of Trees," Proc. of 1983 International Workshop on Graph Theoretic Concepts in Computer Science, 1983.

[U811 R. J. Lipton and R. Sedgewick, "Lower bounds for VLSI," P r d i n g s of 13th Annuat Symposium on the Theory of Computing, May 1981. [NMB811 D. Nath, S. N. Maheshwari and P. C. P. Bhatt, ?E&ient VLSI Networks for Parallel Processing based on Orthogonal Trees," unpublished manuscript. [NS801 D. Nassimi and S. Sahni, "Data Broadcasting in SIMD Computers," IEEE Trans. on Computers, C-30,

[Ma791 G. A. Mago, "A network of microprocessors to execute reduction languages," two parts, Int. J. of Comp. Inf. Sci , 8 6 1 , 1979 and 8(6) 1979.

[ORs821 T. A Ottmm, A. L. Rosenberg and L. J. Stockmeyer, " A Dictionary machine (for VLSI)," IEEE Trans. oncomp., Vol. C-31, No. 9, Sept. 1982.

[PV81] F. Preparata and J. E. Vuillemin, 'The cube- connected cycles, a versatile network for parallel com- putation," Proc. of the 20th Annual symp. on the Foundations of Comp. Sci. , Oct. 1979, pp. 140-147.

ESDP781 C. A. Sequin, A. M. Despain and D. A. Pattemn, '%ommunication in X-uee, a modular mul- tiprocessor system," Proc. of the 1978 Annual Conf. of

[St821 Q. F. Stout, "Using Clerks in Parallel Process- ing," Proc. of 23rd Annual Symposium on Foundations of Computer Science, 1982, pp. 272-280.

[St831 Q. E Stout, " Sorting, Merging, Selecting and Filtering on Tree and Pyramid Machines," Proc. of International Conf. on Parallel Processing, 1983, pp.

kit841 Q. E Stout, Private Communication.

[Ta751 S. L. Tanimoto, "Sorting, Histogramming and Other Statistical Operations on a Pyramid Machine," Dept. of Computer Scknccz, Univ. of Washington, Tech. Repolt, 824642,1962.

1981, pp. 101-107.

ACM, pp. 194-203.

214-221.

97

Page 10: A Comparative Study Of X-Tree, Pyramid And Related Machines

[TI1791 C. D.Thompson, "Area-time complexity for VLSI," Proceedings of the 11th Symposium on the Theory of Computing, May 1981, pp. 81-88. [Th80] C. D. Thompson, "A Complexity theory for VLSI," PhD dissertation, department of Computer Sci- ence, Cannegie Mellon University, 1981).

[TK771 C. D. Thompson and H. T. Kung, "Sorting on a MeshConnected Parallel Computer," Comm. ALM, Vol. 20,1977, pp. 263-271.

['IS311 D. M. Tolle and W. E. $U, " On the ity of vector computations in binary tree machines," Inf. P m . Letters, Vol. 113, No. 3, D8c. 1981.

[Uh721 L. Uhr, "Layered "Recognition Cone" Networks that Reprooess, Classify and Describe," lEEE Trans. on Computers, 1972, pp. 758-768. [Vu801 J. E. Vuillemin, "A combinatorial limit to the computing power of VLST circuits," P m . of the 21st annual Symposium on the Foundations of Comp. Sci. ,

[Ya81] A. Yao, "The Entropic Limitations on VLST Computations," Proc. of the 13th ATlnual Symposium on Theory of Computing, May 1981, pp. 308-311.

NOV. 1980, pp. 294-300.

FIGURE 1

FIGURE 2

98

Page 11: A Comparative Study Of X-Tree, Pyramid And Related Machines

FIGURE 3

FIGURE 4 FIGURE 5

99