7
JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 8,267-273 ( 1990) RESEARCH NOTES Performance of Multiple-Bus Interconnections for Multiprocessors * QING YANG Department ofElectrical Engineering, The University ofRhode Island, Kingston, Rhode Island 02881 AND LAXMIN.BHUYAN' The Centerfor Advanced Computer Studies, University OfSouthwestern Louisiana, Lafayette, Louisiana 70504 Bus structures, in general, are easily understood and therefore preferred by manufactures for implementation. Multiple-bus systems can be viewed as an incremental expansion of single-bus architectures that can provide high bandwidth and reliability. This research note provides a brief overview of various analytical techniques suitable for performance evaluation of the multiple- bus multiprocessors. Some results and comparisons based on these techniques are also provided. o 1990 Academic POW, IIIC. 1. INTRODUCTION A computer system designer can build a multiprocessor system using several inexpensive microprocessors to match or exceed the performance of a high-cost uniprocessor sys- tem. In addition, a multiprocessor offers other advantages such as good modularity, reliability, and expandability. However, the most important and crucial issue in a multi- processor design is the design of its interconnection network (IN). The IN must offer a high-communication bandwidth between the processors and the memories while keeping the cost low. It should also have alternate paths in the network to allow the communication to proceed in case of a fault in the network. A single-bus interconnection is the simplest and least costly among all the INS. At the same time it has the lowest bandwidth and almost no fault-tolerance. A crossbar con- nection [I], on the other hand, has the maximum band- width at a prohibitively high cost. Fault-tolerance is possible if one allows graceful degradation [ 21. However, the 0( N*) * This Research was supported by NSF Grant DMC-58 1304 I and a grant from Louisiana Board of Regents and was carried out when Yang and Bhuyan were with the University of Southwestern Louisiana. ’ Present address: Department of Computer Science, Texas A&M Uni- versity, College Station, TX 77843-3 112. cost for an N* N crossbar does not allow a big system to be designed around this IN. The cost and performance of a multistage interconnection network (MIN) lie between the above two extremes [ 31. Like crossbar graceful degradation is possible and higher reliability than a shared bus can be obtained [ 41. Also, it is possible to design MINs that are fault-tolerant to start with [ 5 1. As a result, a lot of industrial and research multiprocessor projects are based on these. MINs. Although the MINs are very suitable for large scale systems, they are still complex to design. There is an 0( log N) delay in each communication. For a low to me- dium scale multiprocessor, the manufactures would like to have an IN that has the simplicity of a shared-bus but allows more bandwidth and reliability. One type of IN, called multiple-bus, has drawn a consid- erable interest of computer scientists and engineers recently [ 2,6- 16 1. The structure, shown in Fig. 1, consists of a few buses with each bus connected to all the processors and memory modules. The hardware cost and communication capacity of the system depend on the number of buses, B. The cost of interconnection is 0( BN) for an N*N multi- processor. While B = 1 is clearly unacceptable, having a large number of buses will have a cost similar to the crossbar network. The multiple-bus IN is highly fault-tolerant. It is apparent from Fig. 1 that there are B alternate paths be- tween a processor and a memory module that can be used in case of a fault. This IN retains the evolutionary features of a single-bus system, yet provides increased communica- tion capacity and reliability. Choosing the right number of buses for an application environment is a critical design is- sue that needs a thorough knowledge of performance evalu- ation techniques. This paper describes analytical tech- niques that have been developed by us [ 61 and by others for the performance evaluation of multiple-bus multiprocessor 267 0743-73 15190 $3.00 Copyright 0 1990 by Academic Press, Inc. All rights of reproduction in any form reserved.

Performance of multiple-bus interconnections for multiprocessors

Embed Size (px)

Citation preview

Page 1: Performance of multiple-bus interconnections for multiprocessors

JOURNAL OF PARALLEL AND DISTRIBUTED COMPUTING 8,267-273 ( 1990)

RESEARCH NOTES

Performance of Multiple-Bus Interconnections for Multiprocessors * QING YANG

Department ofElectrical Engineering, The University ofRhode Island, Kingston, Rhode Island 02881

AND

LAXMIN.BHUYAN'

The Centerfor Advanced Computer Studies, University OfSouthwestern Louisiana, Lafayette, Louisiana 70504

Bus structures, in general, are easily understood and therefore preferred by manufactures for implementation. Multiple-bus systems can be viewed as an incremental expansion of single-bus architectures that can provide high bandwidth and reliability. This research note provides a brief overview of various analytical techniques suitable for performance evaluation of the multiple- bus multiprocessors. Some results and comparisons based on these techniques are also provided. o 1990 Academic POW, IIIC.

1. INTRODUCTION

A computer system designer can build a multiprocessor system using several inexpensive microprocessors to match or exceed the performance of a high-cost uniprocessor sys- tem. In addition, a multiprocessor offers other advantages such as good modularity, reliability, and expandability. However, the most important and crucial issue in a multi- processor design is the design of its interconnection network (IN). The IN must offer a high-communication bandwidth between the processors and the memories while keeping the cost low. It should also have alternate paths in the network to allow the communication to proceed in case of a fault in the network.

A single-bus interconnection is the simplest and least costly among all the INS. At the same time it has the lowest bandwidth and almost no fault-tolerance. A crossbar con- nection [I], on the other hand, has the maximum band- width at a prohibitively high cost. Fault-tolerance is possible if one allows graceful degradation [ 21. However, the 0( N*)

* This Research was supported by NSF Grant DMC-58 1304 I and a grant from Louisiana Board of Regents and was carried out when Yang and Bhuyan were with the University of Southwestern Louisiana.

’ Present address: Department of Computer Science, Texas A&M Uni- versity, College Station, TX 77843-3 112.

cost for an N* N crossbar does not allow a big system to be designed around this IN. The cost and performance of a multistage interconnection network (MIN) lie between the above two extremes [ 31. Like crossbar graceful degradation is possible and higher reliability than a shared bus can be obtained [ 41. Also, it is possible to design MINs that are fault-tolerant to start with [ 5 1. As a result, a lot of industrial and research multiprocessor projects are based on these. MINs. Although the MINs are very suitable for large scale systems, they are still complex to design. There is an 0( log N) delay in each communication. For a low to me- dium scale multiprocessor, the manufactures would like to have an IN that has the simplicity of a shared-bus but allows more bandwidth and reliability.

One type of IN, called multiple-bus, has drawn a consid- erable interest of computer scientists and engineers recently [ 2,6- 16 1. The structure, shown in Fig. 1, consists of a few buses with each bus connected to all the processors and memory modules. The hardware cost and communication capacity of the system depend on the number of buses, B. The cost of interconnection is 0( BN) for an N*N multi- processor. While B = 1 is clearly unacceptable, having a large number of buses will have a cost similar to the crossbar network. The multiple-bus IN is highly fault-tolerant. It is apparent from Fig. 1 that there are B alternate paths be- tween a processor and a memory module that can be used in case of a fault. This IN retains the evolutionary features of a single-bus system, yet provides increased communica- tion capacity and reliability. Choosing the right number of buses for an application environment is a critical design is- sue that needs a thorough knowledge of performance evalu- ation techniques. This paper describes analytical tech- niques that have been developed by us [ 61 and by others for the performance evaluation of multiple-bus multiprocessor

267 0743-73 15190 $3.00 Copyright 0 1990 by Academic Press, Inc.

All rights of reproduction in any form reserved.

Page 2: Performance of multiple-bus interconnections for multiprocessors

268 YANG AND BHUYAN

Bl B2

Eib

FIG. 1. A multiple-bus multiprocessor system.

systems. Analytical techniques provide a quick evaluation of a system as opposed to simulations.

In evaluation of a multiple-bus system, several opera- tional characteristics of buses must be considered. These de- pend on timing philosophy, arbitration scheme, and switch- ing methodology. The timing of the bus operations can be synchronous or asynchronous. Synchronous schemes are characterized by the existence of a global clock that broad- casts clock signals to all devices so that they work in a lock step fashion. Asynchronous operations, on the other hand, involve no global clock. The switching methodology can be either circuit or packet switched. In circuit switching, once a device is granted to use a bus it occupies the bus for the entire duration of data transfer. In packet switching, the data are broken into small packets and a bus is held only during the transfer of a packet. The access to various buses in the multiple-bus system can be controlled by either a cen- tralized or a decentralized arbiter circuit. Design of a cen- tralized arbiter is simple but it may create a system bottle- neck. A decentralized arbiter, though difficult to design, is better from performance and reliability viewpoints. A clas- sification of multiprocessor INS based on these operational characteristics is given in [ 3 1. The analytical technique ap- plicable to a class of multiple-bus multiprocessor depends on to which one of the eight classifications the system be- longs.

This research note presents a summary of analytical tech- niques of various classes of multiple-bus multiprocessors and is organized as follows. Sections 2 and 3 deal with cir- cuit- and packet-switched systems, respectively. Each sec- tion presents results for four classifications. Section 4 pre- sents some results, discussions, and comparisons. Finally, Section 5 concludes the paper. The performance models presented in this paper are based on the following assump- tions. Additional assumptions required for each analysis are mentioned later at appropriate places.

1. The system consists of N processors, M memory mod- ules, and B system buses. All processors are considered in- dependent and identical, and so are the memory modules and buses.

2. A uniform reference model is assumed which means that a request from a processor addresses any one of A4 memory modules equally likely. This is a reasonable as- sumption for an interleaved memory system.

3. Wherever a queue is constructed in a model, it is as- sumed that the buffer has infinite capacity. This assumption ensures that any “customer” arriving at the queue will not be lost.

2. CIRCUIT-SWITCHED MULTIPLE-BUS SYSTEM

In a circuit-switched system, a processor holds the bus until the memory operation is finished. Extensive studies have been reported in the literature for circuit-switched multiple-bus multiprocessors. Performance models of these systems follow.

2.1. Synchronous Circuit-Switched System

2.1.1. Centralized Control. In this situation the entire system operates synchronously based on a memory cycle; i.e., memory requests, issued by processors, begin and end simultaneously. There is a central controller that scans these requests and selects memories for service. Then the controller allocates buses for the processor-memory con- nections. Each processor generates a memory request at the beginning of a cycle with some probability p. The perfor- mance of this system is defined in terms of bandwidth, B W, which is defined as the mean number of memories remain- ing busy per cycle. A number of researchers [7-lo] have developed analytical models for this circuit-switched syn- chronous centralized multiple-bus system. In all these pa- pers, simple and fairly accurate analytical models have been developed with an independence assumption, which states that the request generated by a processor at the current cycle is independent of its request in the previous cycle. Applying some simple combinatorial analysis, the following expres- sion for the memory bandwidth of an N*M* B system can be obtained [ 7 1,

5

(X--)X! ; ( )

S(Y, x> (1)

MY 2 x=8+1

where ty = min( y, M) and S( y, x) is the Stirling number of the second kind. S( M, x) is defined as

x!S(M,x) = 5 (-1)’ : (x- iY. 0

(2) i=O

Page 3: Performance of multiple-bus interconnections for multiprocessors

MULTIPLE-BUS INTERCONNECTIONS 269

It is stated [ 11, 12 ] and we have verified that this approxi- mate analysis gives results similar to [lo] and is better than the analysis developed in [ 9, 121.

The independence assumption, however, is unrealistic because a rejected request will definitely be resubmitted in the next cycle. In order to obtain more accurate results, the “adjusted request rate” technique [ 8,6] can be used. Let p’ be the adjusted request rate which takes into account the resubmission of rejected requests. Then

+[ 1 +Z(i- l)[‘.

Now we can substitute the request rate p in Eq. ( 1) by p’ of Eq. (3 ) . The resulting equation has one unknown variable, SW, and can be solved by using any standard numerical method. The results are very close to those of simulation for a wide range of parameters [ 6 1.

2.1.2. Decentralized Control. In case of a circuit- switched, synchronous, and decentralized system, a bus may be allocated to a request that addresses a busy memory module. As a result the bus bandwidth is wasted and the overall system performance is degraded. In order to study this effect quantitatively, we develop the following simple model.

Let y be the number of memory requests generated per cycle. The number of requests that can reach memory mod- ules through B buses, y,, is given by

YBM = min(y, B). (4)

Each of yBM requests addresses a particular memory mod- ule with probability 1 /M. If the same assumptions de- scribed in last section hold, then the memory bandwidth under the condition that there are yBM requests passing through the buses is given by

.,,,,y,~,=,[l-(l-~)-]. (5)

Thus, the memory bandwidth is given by

$‘(I -p)*‘M[ 1 -( 1 +r”]. (6)

Clearly, the memory bandwidth calculated from Eq. (6) is always less than or equal to that obtained from Eq. ( 1) . The results have been further improved by applying the rate- adjusted technique [ 6 1.

2.2. Asynchronous Circuit-Switched System

Asynchronous operation differs from the synchronous operation in the sense that it does not have the global clock.

A memory request from a processor can be generated at any instant of time and a request may be granted a free bus at any instant without the restriction of the beginning of a cy- cle. Continuous-parameter Markov models instead of dis- crete probabilistic analysis are more appropriate for such systems.

2.2.1. Centralized Case. A great deal of research has gone into the analysis of an asynchronous, circuit-switched mul- tiple-bus system that has a centralized controller [ 13- 16 1. Marsan and Gerla [ 131 developed a set of Markov models for the system. In [ 151, local balance of the multiple-bus network has been proved. Towsley [ 161 modeled the system by using flow equivalence and surrogate delay techniques which turned out fairly accurate results.

The flow equivalence technique [ 161 works as follows. The bus and memory subsystem, called aggregate, is re- placed by a single flow equivalent center. Solving the model requires two steps. First, bus-memory service center (aggregate) is solved in isolation. Then the entire queuing network model is solved on the basis of the results in the first step. To solve the aggregate, all the processors are “shorted”; i.e., the processors are removed from the original system. In the resulting shorted network, memory requests that complete their memory transfers immediately cycle back to memories. The throughput is calculated for a different number of requests, n, cycling through this net- work. After the throughput is determined for all possible values of n , the original queuing network can be solved by taking the bus-memory subsystem as a single service center. Techniques, such as mean value analysis (MVA) algorithm [ 17 1, exist for solving such queuing networks.

2.2.2. Decentralized Case. In this operation, a station that has a request holds the bus until it gets the addressed memory. This system causes a lot of wastage in bus band- width and therefore should not be implemented. Therefore, a circuit-switched, asynchronous, and decentralized multi- ple-bus system is considered impractical for implementa- tion. However, for completeness, we have done an analysis of the system based on surrogate delay technique [ 61.

3. PACKET-SWITCHED MULTIPLE-BUS SYSTEMS

In packet switching, a memory request packet generated by a processor is first put in the bus access queue to wait for an available bus. When a bus is granted, the processor transmits the packet to the addressed memory module and releases the bus immediately after the packet is sent. Since there may be more than one processor addressing the same memory module, the packet has to join the memory input buffer. After the request finishes the memory operation, a ready response packet is placed in the memory output buffer where the packet tries to get an available bus. Finally, the packet is transmitted back to the requesting processor. With this organization, every time a memory module com-

Page 4: Performance of multiple-bus interconnections for multiprocessors

270 YANG AND BHUYAN

pletes one service it puts the response packet in the output buffer and becomes available to serve the next request in the queue. The efficiency of the system is increased due to the fact that a memory module can now be busy serving different requests continuously. Also the buses can become free as soon as they deliver the packets, so they can be well utilized.

3.1. Synchronous Packet-Switched Systems

In synchronous operation, the system cycle or the bus transfer time is a constant value and constitutes the basic time unit in our analysis. All processors are synchronized at this time unit called bus cycle. The cycle time of each memory is equal to T bus cycles for some integer value of T. Arbitration delay is considered to be included in this value. The arbitration policy for resolving bus conflict can be based on either random selection or cyclic selection. Sta- tistically these two policies result in the same mean. The bus arbitration process is done within a cycle irrespective of whether the buses are centrally or decentrally controlled. Therefore, we do not distinguish between centralized and decentralized control strategies in synchronous analysis. Encore Multimax is an example of a single-bus packet- switched, synchronous, and centralized system [ 18 1. A pro- cessor is said to be active when it is busy in computation. The processor issues a bus request for memory access at the beginning of a cycle with some probability p . After it issues a memory request, the processor keeps idle until the mem- ory service finishes. The proportion of time that a processor remains active is defined as processor utilization ( PJ.

Let W, be the time from which a processor generates a request until the request arrives at the addressed memory module. Let W, represent the delay that a packet experi- ences in a memory input buffer which includes both the waiting time in the buffer and the time to perform the mem- ory operation. And W, denotes the packet delay in a mem- ory output buffer which is the sum of queuing delay and packet transmission time on a bus. The processor utiliza- tion, P, , can be expressed as

P, = k k+kp(W,+ IV,+ I+‘,)

(7) 1

The above equation can be easily derived by observing a processor for k cycles. During this time kp global memory requests would have been generated. Each of these kp mem- ory requests requires W, + W, + W, cycles to finish. As- suming independence between various queues simple queuing analyses lead to the following formula [ 19 1,

2 - pP,,NIMB - NT,P,,fM T 2( 1 - NTpP,,f M)

(8) . ’ + M-NpP,,

)I

-’ P,M-NpP, ’

where P, is the probability that a bus request is accepted by the bus system and its derivation can be found in [ 191. Equation (8) has one unknown variable P, and can be solved using a standard numerical method. An initial guess of P, is 1 / ( 3 f T) which is the maximum possible processor utilization. The results obtained from Eq. (8) have been verified through simulations and are shown to produce ac- ceptable errors [ 19,6 ] .

3.2. Asynchronous Packet-Switched Multiple-Bus Systems

Since in asynchronous operation a memory request can be generated or granted to a bus at any instance in time, the control strategy, centralized or decentralized, directly affects the system behavior, and we will consider them sepa- rately. The detailed analysis of these systems can be found in [ 19 ] . For brevity, we will only summarize the techniques here.

3.2.1. Centralized Control. A packet-switched, asyn- chronous, and centrally controlled system can be modeled by using a simple closed queuing network [ 191. Processors in the system are modeled as delay servers with a mean ser- vice time of Z and memory modules are modeled as FCFS servers. Since a central controller allocates the buses in a centrally controlled bus system, the bus system is modeled as an FCFS center with an equivalent service rate of B buses. A request packet generated by a processor is first put in the bus system queue, waiting for an available bus. After it gets access to a bus, the packet joins one of M memory queues with a probability l/M. The memory module which finishes the service of a packet puts the response packet again in the bus queue. From there, the response packet gets back to the requesting processor through a bus. To this point the packet finishes one rotation through the network, and the processor resumes its background activity. The queuing network is then solved by applying the approx- imate MVA algorithm [20] to take into account the fixed service times over the bus and memory centers. Simulation results also indicate a good match with those obtained from analysis [ 19,6].

3.2.2. Decentralized Control. The actual implementa- tion of decentralized multiple-bus can be either token bus or daisy chain bus as described in [ 2 11. The bus granting policy is round-robin and the bus grant signal on each bus can be seen as a token. The B tokens on the B buses are independent and operate asynchronously. If a device (processor or memory module) has no packet to transmit, the interface of the device simply passes the arriving token

Page 5: Performance of multiple-bus interconnections for multiprocessors

MULTIPLE-BUS INTERCONNECTIONS 271

I I - 38.00 50.00

Proccsoor Thinking Time, Z

FIG. 2. A comparison between packet-switched and circuit-switched 16 * 16 multiple-bus multiprocessors with two buses.

to the next device. The delay incurred is assumed to be a constant r between two successive devices. When a device has a packet to transmit, the interface simultaneously moni- tors all the B buses and captures the earliest arriving token on any one of the B buses. Once it gets the token, it can transmit the packet in its buffer on the bus. The device will then pass the token as soon as it finishes the transmis- sion [21].

We use hierarchical modeling techniques [ 17 ] by defin- ing the bus system as the aggregate queue of the system, and

the rest of the system, processors and memory modules, as a complement network. The solution of the resulting net- work involves two steps [ 191, ( 1) determining the through- put of the bus system queue for each possible population n = 1,2, . ..) N to obtain a flow equivalent service center (FESC); (2) solving the high-level queuing network with the bus system queue being FESC. Detailed analysis and results of this system are given in [ 19 ] .

4. DISCUSSIONS

In the previous sections, we have described eight different categories of multiple-bus interconnections for multipro- cessors. They differ from each other in design cost, reliabil- ity, and performance. For example, synchronous systems are simpler in circuit design than asynchronous systems but lack flexibility and expandability. A centrally controlled system always suffers from the problem of controller bottle- neck and poor reliability. All these aspects are qualitatively true and need no further explanation. Given the input pa- rameters and a configuration, the performance of the sys- tem can be readily predicted using our analytical models. A direct quantitative performance comparison of different systems is difficult because of the incompatibility of the sys- tem operations. Moreover, choosing wrong input parame- ters may give rise to wrong conclusions. However, the per- formance differences between some packet-switching and circuit-switching systems can be shown based on the analyt- ical models presented in the previous sections.

Figure 2 shows the processing powers (PJV) as functions of mean interval time (Z ) between two successive memory requests for a 16 * 16 asynchronous centrally controlled two-bus multiprocessor system. Curves for a circuit- switched system and for a packet-switched system are

TABLE I Bandwidth Comparison of Three Types of INS

N=4 N=8 N= 12 N= 16 IN type and No. of Buses p = 0.5 p=l p= 0.5 p=l p = 0.5 p=l p = 0.5 p=l

MIN: 1.56 2.44 2.81 4.13 4.22 6.20 5.13 7.20

Crossbar: 1.66 2.13 3.23 5.25 4.80 1.18 6.31 10.30

B= 1 0.94 1.00 1 .oo 1 .oo 1.00 1 .oo 1.00 1.00 B= 2 1.51 1.98 1.94 2.00 1.99 2.00 2.00 2.00 B= 4 1.66 2.13 3.09 3.98 3.19 4.00 3.91 4.00 B= 6 3.23 5.18 4.68 5.98 5.51 6.00 B= 8 3.23 5.25 4.80 1.41 6.21 1.99 B= 10 4.80 1.11 6.31 9.66 B= 12 4.80 1.18 6.31 10.26 B= 14 6.37 10.30 B= 16 6.37 10.30

Page 6: Performance of multiple-bus interconnections for multiprocessors

272 YANG AND BHUYAN

0.8 4

0.6

0.4

0.2

0.0

rossL%r MIN

B-4

B-2 B-l

I I I I I - 0.0 0.2 0.4 0.6 0.8 110

Probability of quest, p

FIG. 3. Processor utilization as a function of probability of request for 16 * 16 synchronous packet-switched systems.

shown in the figure. In circuit-switching mode, the only queuing delay results from the time during which the com- munication path between the requesting processor and the addressed memory module is being established. Once the path is established, it is dedicated to this entire memory op- eration. It is assumed that a memory operation takes four processor cycles [ 181. Hence, in the packet-switched multi- ple-bus system six cycles are needed for a memory opera- tion excluding any queuing delay, two cycles for bus trans- actions and four cycles for memory access. The same as- sumption is also applied to the circuit-switched case to make a fair comparison. It is shown in the figure that the packet-switched system performs better than the circuit- switched system under this condition. In fact, the same re- sult has been observed for a variety of system parame- ters [6].

Compared to multistage interconnection network and crossbar, the multiple-bus interconnections have a number of advantageous features. Let us first consider Table I which lists BW of N*N INS based on a synchronous circuit- switched operation. Results for probabilities of requests of 0.5 and 1.0 per cycle are presented in the table. The BW increases with the increase in the number of buses, but so does the cost. Hence there is a lot of flexibility on the part of the designer to choose the number of buses depending on the requirement. When the number of buses is equal to half of the number of processors, the bandwidths of multiple- bus systems are comparable to those of crossbar networks.

Even less buses are required to obtain a similar performance in case the buses are packet-switched. This fact is illustrated in Fig. 3 which shows the processor utilization vs request rate for the three INS based on a synchronous packet- switched operation. In this figure, the memory access time is assumed to take four system cycles [ 18 1. It is shown that a multiple-bus-based multiprocessor system with only four buses can achieve almost the same performance as that of the crossbar- or MIN-based systems while reducing the hardware cost significantly.

In addition to the advantages related to performance and cost discussed above, the multiple-bus structure is also very attractive from a reliability point of view. Figure 4 shows reliability curves vs time for the three types of INS in a 16 * I6 multiprocessor system [ 41. The failure rates used to obtain the curves are shown in the figure. Graceful degrada- tion is allowed and three sets of curves are plotted for a task requiring 8, 12, and 16 processor-memory pairs, respec- tively. The reliability of the multiple-bus system is consis- tently better than that of the other two networks in all the three cases. Similar results for BWavailability were also ob- tained [ 4 ] .

5. CONCLUSIONS

Two very important and central properties of any system are structure and behavior. As a structure for interconnect- ing shared memory multiprocessor systems, the multiple- bus system has been considered a very attractive candidate that offers a number of advantages over other structures.

R,(t)

0 600 1200 1800 2400 3000

Time (hours)

FIG. 4. Reliabilities of a 16 * 16 * 8 multiple-bus-based system and 16 * 16 crossbar- and MIN-based systems for a task requiring I processors and Z memories. (m) Multiple-bus; (-) crossbar; (+) MIN. & = X, = 0.000 1; X, (crossbar, multiple-bus) = 0.000005; X, (omega) = 0.00002.

Page 7: Performance of multiple-bus interconnections for multiprocessors

MULTIPLE-BUS INTERCONNECTIONS 273

Particularly it is very suitable for medium scale systems. We have discussed the performance issues for various intercon- nection schemes of the multiple-bus system depending on its timing, switching, and control. All the performance models consider realistic situations such as fixed memory access time and the bus arbitration procedures. Approxima- tion techniques such as Newton’s iteration techniques, flow equivalence service center model, and some heuristic algo- rithms were used in solving these performance models. The performance models, presented here, have been carefully compared with the simulation results indicating acceptable error margins [ 61. We believe that the multiple-bus system offers a simple, flexible, and reliable IN for future medium scale multiprocessor systems.

1.

2.

3.

4.

5.

6.

I.

8.

9.

10.

I I.

REFERENCES

Wulf, W. A., and Bell, C. G. C.mmp-A multi-mini-processor. AFZPS Proc. Fall Joint Comput. Conf. 1912, pp. 765-111.

Das, C. R., and Bhuyan, L. N. Bandwidth availability of multiple-bus multiprocessors. IEEE Trans. Comput. C-34 (Oct. 1985), 918-926.

Bhuyan, L. N., Yang, Q., and Agrawal, D. P. Performance ofmultipro- cessor interconnection networks. IEEE Comput. (Feb. 1989), 25- 37.

Das, C. R., and Bhuyan, L. N. Reliability simulation ofmultiprocessor systems. Proc. Int. Conf: on Parallel Processing, Aug. 1985, pp. 59 l- 598.

Adams, G. B., III, Agrawal, D. P., and Siegel, H. J. A survey and com- parison of fault-tolerant multistage interconnection networks. IEEE Comput. (June 1987), 14-27.

Yang, Q. Analysis of cache based multiple-bus multiprocessors. Ph.D. dissertation, The Center for Advanced Computer Studies, University of Southwestern Louisiana, 1988.

Bhuyan, L. N. A combinatorial analysis of multibus multiprocessors. Proc. Int. Conj on Parallel Processing, Aug. 1984, pp. 225-227.

Mudge, T. N., Hayes, J. P., Buzzard, G. D., and Winsor, D. C. Analysis of multiple-bus interconnection networks. J. Parallel Distrib. Com- put. 3, (Sept. 1986).

Goyal, A., and Agerwala, T. Performance analysis of future shared storage system. IBM J. Res Develop. 28 (Jan. 1984), 95-108.

Valero, M., et al. Analysis for multiple-bus interconnection networks. Proc. ACMSIGMETRICS Conf. 1983, pp. 200-206.

Holliday, M. A., and Vernon, M. K. Exact performance estimate for multiprocessor memory and bus interference. IEEE Trans. Comput. C-36(Jan. 1987), 76-85.

12.

13.

14.

15.

16.

17.

18.

19.

20.

21.

Mudge, T. N., Hayes, J. P., and Winsor, D. C. Multiple-bus architec- tures. IEEE Computer, Special Issue on Interconnection Networks, June 1987, pp. 42-48.

Marsan, M. A., and Gerla, M. G. Markov models for multiple bus multiprocessor system. IEEE Trans. Comput. C-31 (Mar. 1982), 239-248.

Yang, Q., and Zaky, S. G. Communication performance in multiple- bussystems. IEEE Trans. Comput. 37 (July 1988), 848-853.

Irani, K. B., and Onyuksel, I. H. A closed-form solution for the perfor- mance analysis of multiple-bus multiprocessor system. IEEE Trans. Comput. C-33 (Nov. 1984), 1004-1012.

Towsley, D. Approximate models of multiple bus multiprocessor sys- tems. IEEE Trans. Comput. C-35 (Mar. 1986), 220-228.

Lazowska, E. D., et al. Quantitative System Performance-Computer System Analysis Using Queueing Network Models. Prentice-Hall, Englewood Cliffs, NJ, 1984.

Multimax, Multimax Technical Summary. Encore Computer Corpo- ration, Marlboro, MA, May 1985.

Yang, Q., Bhuyan, L. N., and Pavaskar, R. Performance analysis of packet-switched multiple-bus multiprocessor systems. Proc. the Eighth Real-Time Systems Symposium, Dec. 1987, pp. 170- 178.

Reiser, M. A queueing network analysis of computer communication networks with window flow control. IEEE Trans. Comm. COM-27 (Aug. 1979), 1199-1209.

Yang, Q., and Bhuyan, L. N. Design and analysis ofdecentralized mul- tiplebus multiprocessor. Proc. Int. Conf on Parallel Processing, Aug. 1987, pp. 889-892.

QING YANG is an assistant professor in the Department of Electrical Engineering at the University of Rhode Island. His research interests in- clude parallel and distributed computer systems, design of digital systems, performance evaluation, and local area networks. Yang received a B.Sc. degree in computer science from Huazhong University of Science and Technology in China in 1982 and an M.A.Sc. in electrical engineering from the University of Toronto in 1985. He received a Ph.D. in computer engineering from the Center for Advanced Computer Studies, University of Southwestern Louisiana. Yang is a member of the IEEE Computer Society.

LAXMI N. BHUYAN is an associate professor in the Department of Computer Science at The Texas A&M University. His research interests include parallel and distributed computer architecture, performance and reliability evaluation, and local area networks. Bhuyan received B.S. and M.S. degrees in electrical engineering from the Regional Engineering Col- lege, Rourkela, under Sambalpur University in India. He received a Ph.D. in computer engineering from Wayne State University in Detroit in 1982. Bhuyan is a senior member of the IEEE and a distinguished visitor of the IEEE Computer Society and served as guest editor of the Computer issue on interconnection networks in June 1987.

Received March 3, 1988; revised January 26, 1989