Kautz Mesh Topology for on-Chip Networks

8/7/2019 Kautz Mesh Topology for on-Chip Networks

http://slidepdf.com/reader/full/kautz-mesh-topology-for-on-chip-networks 1/8

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 2, FEBRUARY 2011, ISSN 2151-9617

HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/

WWW.JOURNALOFCOMPUTING.ORG 33

Kautz Mesh Topology for on-Chip NetworksR. Sabbaghi-Nadooshan

Abstract During the recent years, 2D mesh Network-on-Chip has attracted much attention due to its suitability for VLSI

implementation. This paper introduces the 2-dimensional Kautz topology for Network-on-Chips as an attractive alternative to the

popular simple 2D mesh. The cost of 2D Kautz is equal to that of the simple 2D mesh but it has a logarithmic diameter. We

compare the proposed network and the mesh network in terms of power consumption and network performance. Compared to

the equal sized simple mesh NoC, the proposed Kautz-based network has better performance while consuming less energy.

Also by vertically stacking two or more silicon wafers, connected with a high-density and high-speed interconnect, it is now

possible to combine multiple active device layers within a single IC. In this paper we propose an efficient three dimensional

layout for a novel 2D mesh structure based on the Kautz topology. Simulation results show that by using the third dimension,

performance and latency can be improved compared to the 2D VLSI implementation.

Index Terms 3D layout, 3D NoCs, Kautz, NoCs, Performance evaluation, Power consumption, SoC.

1 INTRODUCTIONowadays there are many challenges in designing acomplicated integrated circuit containing severalprocessing and memory elements integrated on a

single chip as a System-on-Chip (SoC). SoC design hassome limitations in the DSM technologies. One of the ma-jor problems associated with future SoC designs arisesfrom non-scalable global wire delays [1]. These limita-tions have been mentioned in many studies [1][2] includ-ing the limitation in the number of IP cores that can beconnected to the shared bus, arbitration for accessing theshared bus, reliability, and so on. To overcome these limi-

tations, network-on-chip (NoC) is introduced in recentyears and much research has been conducted in this area.A proper classification of these studies has been discussedin [3]. Another new technology that has been proposed isthe three dimensional VLSI that exploits the vertical di-mension to alleviate interconnect related problems and tofacilitate heterogeneous integration of technologies torealize a SoC design [4]. By combining the ideas in thesetwo types of technology, a new kind of architecture forNoCs is imaginable. In [5], new insights on network to-pology design for 3D NoCs is provided and issues relatedto processor placement across 3D layers and data man-

agement in L2 caches is addressed. In three dimensionaldesigns with the aid of small links between adjacent lay-ers, there is a noticeable improvement in network per-formance.

The performance and also the power consumption ofthe circuit mainly depend to the traffic pattern, routingalgorithm, switching method, and topology. Some com-mon structures have been used for NoC implementationsincluding mesh and torus networks.

The mesh topology is the most dominant topology fortoday’s regular tile-based NoCs. It is well known that

mesh topology is very simple. It has low cost and con-sumes low power. During the past years, much effort hasbeen made toward understanding the relationship be-tween power consumption and performance for meshbased topologies [6]. Despite the advantages of meshesfor on-chip communication, some packets may sufferfrom long latencies due to lack of short paths betweenremotely located nodes. A number of previous works tryto tackle this shortcoming by adding some application-specific links between distant nodes in the mesh [7] andbypassing some intermediate nodes by inserting express

channels [8], or using some other topologies with lowerdiameter [9].

The fact that the Kautz network has a logarithmic di-ameter and a cost equal to the linear array topology moti-vated us to evaluate it as an underlying topology for on-chip networks. Kautz topology is a well-known networkstructure which was initially proposed by Kautz [10] asan efficient topology for parallel processing. Several re-searchers have studied topological properties, routingalgorithms, efficient VLSI layout and other importantaspects of the de Kautz networks [11].

In this paper, we propose a two-dimensional Kautz-

based mesh topology for NoCs. We will compare equiva-lent mesh and 2D Kautz architectures using the two mostimportant performance factors, network latency andpower consumption. A routing scheme for 2D Kautznetwork has been developed and the performance andpower consumption of the two networks under similarworking conditions has been evaluated using simulationexperiments. Simulation results show that the proposednetwork can outperform its equivalent popular mesh to-pology in terms of network performance and energy dis-sipation.

Furthermore, we propose an efficient 3D VLSI layoutfor the 2D Kautz topology and we show that still it ispossible to improve performance by using 3D VLSI tech-nologies.

The rest of the paper is organized as follows. The nextsection presents the 2D Kautz structure. Section 3 dis-

• R. Sabbaghi-Nadooshan is with the Department of Electronics, IslamicAzad University Central Tehran Branch, Tehran, Iran.

N






cusses the background in NoC, 3D VLSI, and explains thecharacteristics of 3D designs that may affect the perfor-mance of 3D NoCs. Also the 3D VLSI layout of the 2DKautz structure is discussed in section 3. Section 4 ex-plains the simulation environment that the results haveobtained in it, and then the simulation results and expe-rimental evaluation of our approach are described. Final-ly, we conclude the paper in section 5.

2 KAUTZ TOPOLOGY

2.1 Kautz architecture

A class of digraphs that generalize Kautz digraphs [10] isdefined in [11]. Suppose that the vertices are numberedwith integers modulo N (N being the number of nodes).Assuming an out-degree of d, then a vertex v is joined tovertices u = -dv - i (mod N ), for i=1, 2, …, d. The diameter

of the resulting digraph is at most logd N and if N=dn+dn-k

Based on the above definition, an N -node binary Kautzdigraph is then defined as follows. Node v, with a linearaddress 0…N -1, is joined to vertices with integer address -2v-1 (mod N ) and -2v-2 (mod N ).

(for a positive odd integer k), the diameter becomes n.

An 8-node binary Kautz digraph is depicted in Fig. 1.Each node generates two connections to other nodes andaccepts two connections from other nodes. Owing to thefact that these connections are unidirectional, the degreeof the network is the same as the one-dimensional meshnetworks. The diameter of a Kautz network with size N is

log (N) which is the minimum distance between nodes 0and N.

Fig. 1. 1D Kautz network with 8 nodes.

2-2 Two dimensional Kautz

The 2D Kautz networks have some interesting topologicalproperties which motivates us to consider them as a suit-able candidate for on-chip network architectures [12]. Thearchitecture of this network is depicted in Fig. 2. In thisnetwork, the nodes in each row and column form a Kautznetwork. The most important property is that while thenumber of links in a 2D Kautz and an equal-size mesh areexactly the same, the network diameter of the 2D Kautz isless than the diameter of the mesh.

The Kautz links are unidirectional and the maximumof 8 unidirectional links per node in the Kautz equals tothe maximum number of links for mesh nodes (which are

4 bidirectional links). Since the node degree of a topologyhas an important contribution in (and usually acts as thedominant factor of) the network cost, the proposed topol-ogy can achieve lower average distance than a 2D mesh

while it imposes almost the same cost.

Fig. 2. 8×8 2D Kautz

2-3 Layout improvement

Based on the necklace properties in de Bruijn layout [13],we have considered a more efficient layout for each row

and column of the Kautz as shown in Fig. 3. With thisnew layout the total wire length used in the network isdecreased. For example, for an 8×8 2D Kautz about 57%reduction in total wire length is obtained.

Fig. 3. A better node placement of Kautz network

2-4 Routing algorithm

Kautz graph has similarities to de Bruijn graph [11], andfor routing algorithm we adopt the routing methods in-troduced in [14] by Park as a common routing schemewhen developing routing algorithms for the 2D Kautznetworks.

Park’s algorithm uses virtual channels unevenly wherevery few packets use all the virtual channels. Numberingvirtual channels in an increasing order starting from 0, ituses virtual channel 0 more than other virtual channels.

For example, for N =16, some of the source nodes use onlyvirtual channel 0, some nodes use virtual channels 0 and1, and a few source nodes use virtual channels 0, 1, and 2.

This routing algorithm was revised to make a more ba-

5 4 1 30 6 27





lanced use of virtual channels. Here, at each node, thepacket has a degree of flexibility in selecting virtual chan-nels [9]. For the same example of N =16 given above, thelast group of source nodes (that use 3 virtual channels),virtual channel 0 is selected to inject the message. For thesecond group, packets start their journey with virtualchannel 1, and for the first group (using 1 virtual channel)they start with virtual channel 2. If a message wants tostart with virtual channel 2 and it is occupied, it can tryvirtual channel 1, and if it is also occupied it can try start-ing with virtual channel 0. Through this method, the vir-tual channels are used more uniformly that results in amore balanced traffic over network channels. Fig. 4 showsthe pseudo code of the routing algorithm.

In this paper, we apply the routing algorithm given inFig. 4. To this end, a message first routes the packet in Xdimension using the above-mentioned routing algorithmin such a way that the route length be minimal. When thex-value of the current node and x-value of the destination

node addresses become equal, the packet is then routed tothe destination by applying the same routing algorithm inthe destination columns. Note that routing in the two di-mensions cannot generate cyclic dependencies as the baserouting algorithm in each dimension is deadlock-free andthe packets travel the network in a dimension order.Therefore, the resulted routing algorithm is deadlock-free.

Fig. 4. The pseudo code of the routin g algorithm

.

3. 3D VLSI TECHNOLOGIES

There are four basic components in every NoC that

should be considered: 1) Processing elements that are theIP cores connected by the network; 2) Routers andswitches that route the packets till received at destination;3) Network adapters that are the interface between PEsand switches; 4) Links that connect two adjacent switches.When analyzing the network, the behavior of these com-ponents should be considered carefully. A different set ofconstraints exists when adapting these architectures tothe SoC design paradigm. High throughput and low la-tency are the desirable characteristics of a multiprocessorsystem. Instead of aiming strictly for speed, designersincreasingly need to consider energy consumption con-straints, especially in the SoC domain [15]. So, in order tocompare different NoC architectures, there are some im-portant metrics that should be considered such as net-work latency, energy consumption, and throughput [15].In this paper, these metrics are investigated in mesh and2D Kautz architectures in the context of 3D VLSI.

Presently, there are several possible fabrication tech-

nologies that can be used to realize multiple layers of ac-tive-area (single crystal Si or re-crystallized poly-Si) sepa-rated by interlayer dielectrics (ILDs) for 3D circuitprocessing). A brief description of these alternatives isgiven in [4].

Generally, there are some main advantages using thethird dimension in VLSI design and these advantages canbe very useful in NoC architectures. The benefits of 3DICs include: 1) higher packing density due to the additionof a third dimension to the conventional two-dimensionallayout, 2) higher performance due to reduced averageinterconnect length, and 3) lower interconnect power con-

sumption due to the reduction in total wiring length [5].Furthermore, the 3D chip design technology can be ex-ploited to build SoCs by placing circuits with differentvoltage and performance requirements in different layers[4]. The first benefit is true for conventional circuits andalso for NoC architectures. For example, if 64 IP cores in aNoC architecture are organized in a 3D network insteadof 2D organization, the chip area reduces almost fourtimes and we will have more integration in design. Inconventional integrated circuits, the length of globalwires is very important in latency and power consump-tion especially in emerging deep sub-micron technologies.In NoC architectures, even though wires are invariable insize, links between vertical layers can be very short incomparison with the links in each layer in the second di-mension. The shorter the links are, the less power theyconsume. The last benefit is very amazing in SoC designthat is applicable in NoC architectures. The digital andanalog components in the mixed-signal systems can beplaced on different Si layers, thereby achieving betternoise-performance due to lower electromagnetic interfe-rence between such circuit blocks [4].

Many studies have investigated the performance ofthree dimensional designs [4, 16, 17, 18]. Most of suchevaluations are based on wire-length distributions. It

means that a stochastic 3D interconnect model is pre-sented and the impact of 3D integration on circuit per-formance and power consumption is investigated. In thisstudy, our attention is focused on a network architecture

Algorithm Routing (C, D, Vadd, VC)

Inputs: Current node C=(XC, YC); Destination node

D=(Xd, Yd);Current sub-graph indicator Vadd;

Current virtual channel VC;

Begin

If (C=D) then return EjectionChannel;

If (XC#Xd) then

(PC, g) = Next-Node-X (XC, Xd);

If Vadd= decreasing and g=increasing then VC++;

Vadd=g;

return (PC, VC);

Endif;

If (YC#Yd) then

(PC, g) = Next-Node-Y (YC, Yd);

If Vadd= decreasing and g=increasing then VC++;

Vadd=g;

return (PC, VC);

Endif






point of view of three dimensional technologies.There are some characteristics and constraints that

should be considered when designing 3D architectures.The characteristics of conventional VLSI are still impotantin the second dimension even in a 3D design context.

A) Latency in vertical communications: Vertical links, due

to their small length, provide fast communication be-tween vertical layers. For instance, in a 70nm technology,the distance between adjacent vertical layers is about 3-70micron. If a 4x inverter drive the link, it takes about 7psthat is not considerable in comparison with the links inhorizontal communications [15]. So, the capacitance andresistance of these links are different. In [17, 18], the linkbetween two components in different active layers is splitinto two parts: horizontal and vertical that are different inresistance and capacitance.

B) Vertical link density in 3D technologies: The density oflinks in vertical communications is limited by fabrication

process. It means that the space between vertical channelsis limited. Via pitch in different processes is variable be-tween 1 to 7 micron [15, 19]. The constraints on links mayresult in limitation in bus bandwidth. Compared to a wirepitch of 0.1 µm, inter-layer vias are significantly largerand cannot achieve the same wiring density as intra layerinterconnects [5].

C) Area overhead of vertical links: As mentioned, there islimitation in density of vertical links, so they may takesignificant area in each layer. It may be about 10 percentof the area occupied by routers and switches [5].

D) Complexity of NoC routers: One inter-layer intercon-nects option is to extend the NoC into three dimensions.This requires the addition of two more links (up anddown) to each router. However, adding two extra links toa NoC router will increase its complexity (from 5 links to7 links) [5].

E) Heat dissipation: An extremely important issue in 3DICs is heat dissipation. The problem is expected to be ex-acerbated by the reduction in chip size, assuming thatsame power generated in a 2D chip will now be generat-ed in a smaller 3D chip, resulting in a sharp increase inthe power density [4]. So, the number of active layers in

3D technologies is usually limited to 4 or 5 layers.

F) Different layers: Different layers may be processed bydifferent technologies, so the characteristics of each layermay differ and should be considered in simulation.

3.1. THE 3D VLSI LAYOUT OF 2D KAUTZ

In our proposed layout, we assume 4 layers for the 3Dimplementation of the network and 64 nodes are distri-buted in these 4 layers. Fig. 3 (in the previous section) canbe changed to Fig. 5 for 4 layers spanning in three dimen-sions.

In Fig. 5, each node represents a row of the network

shown in Fig. 2. For instance, Row#5 has 8 nodes thateach of these nodes is connected with a unidirectional

link to its counterpart in Row#4, and Row#7 has 8 nodesthat each of them is connected with a bidirectional link toits counterpart in Row#0.

By connecting vertical nodes with fast vertical intecon-nects, the performance of 3D structure outperforms the2D implementations. The area of the chip is also reducedby nearly two times. Moreover, this structure has anotheruseful characteristic: There are fewer links between nodesin layer#0 and so the power that dissipates in this layer isless than the upper layers. This feature can be beneficialin 3D architectures because of the heat dissipation in 3DVLSI technologies.

Fig.5. 3D VLSI layout of 2D Kautz architecture.

4. SIMULATION ENVIRONMENT

To simulate the proposed NoC topologies, an intercon-nection network simulator that is developed based onPOPNET [20] with an embedded latest version of Orionpower library [21] was used. Providing detailed powercharacteristics of the network elements, Orion enables thedesigners to make rapid power evaluation at the architec-ture level [21].

The POPNET simulator was modified to mimic the ex-act operation of the 2D mesh and the 2D Kautz NoCs. Thesimulator was also customized to support other topolo-gies and other routing algorithms such as shuffle-exchange [22], de Bruijn[23],and Kautz topologies [12].The power consumption can be obtained for each compo-nent of the network and for each layer (if implementedusing 3D VLSI technology).

For each simulated network, the physical link widthwas assumed to be 32 bits. The power was calculatedbased on a NoC with 90 nm technology whose routersoperate at 250 MHz. Based on the core size informationpresented in [24], the side size of each IP core was set to 2mm, and the length of each wire was set based on thenumber of cores it passes through.

The message length was assumed to be M=32 flits and

V=2 virtual channels per physical channel were used. Ateach node, messages were generated according to a Pois-

© 2011 Journal of Computing Press, NY, USA, ISSN 2151-9617

http://sites.google.com/site/journalofcomputing/





son distribution with an average generation rate of λ mes-sages per cycle. The simulations were performed underuniform, matrix-transpose, and hotspot [25] traffic pat-terns.

Note that for matrix-transpose traffic load, it was as-sumed that 30% of the messages generated by a nodewere of matrix-transpose type (i.e. node (x,y) sends amessage to node (y,x)) and the rest of the messages weresent to other nodes uniformly. In hotspot traffic load, ahotspot rate of 16% was assumed, i.e. each node sent 16%of messages to the hotspot node (which is assumed to benode (4,0) in the 88 network) and the rest of messageswere sent to other network nodes uniformly.

4.1. SIMULATION RESULTS

In Fig. 6(a), the average message latency is displayed as afunction of message generation for the 8×8 2D mesh and2D Kautz NoCs using V=2 virtual channels per physical

channel and for two different message sizes M=32 flitsunder various traffic patterns.

As can be seen in the Fig. 6(a), the 2D Kautz NoCachieves a reduction in message latency with respect tothe simple 2D mesh network for the full range of networkload under various traffic patterns. Non-fixed lengths willalso result in some variations in the delay and power ofthe network links. Since the operating frequency of a NoCis often determined by the longest router pipeline stage,the long wires may not degrade the NoC operating fre-quency. This can be achieved by segmenting long linksinto regular fixed length links connected by 1-flit buffers.

The size of each segment equals to the size of a link con-necting two adjacent nodes. Using 1-flit buffers (which isinspired from pipelined circuit switching methods inconventional interconnection networks [25]) providespipelining over the link and also acts as a repeater for it.By sending the flits of a message over a long link in a pi-pelined fashion, latency-insensitive operation is guaran-teed as discussed in [7]. Note that we have taken this pi-pelined transmission into account in the simulation, as

well as the effect of buffers and link lengths in the powerconsumption.

Fig. 6(b) depicts power consumption of the considerednetworks under various traffics. As shown in the figure,the 2D Kautz can effectively reduce the power consump-tion of the NoC compared to the equivalent mesh topolo-gy. The main source of such a noticeable reduction is theless hop counts taken by the messages (on average) andhence saving the power which is consumed in interme-diate routers in an equivalent mesh topology.

In Fig. 7, the average latency and power consumptionin 3D and 2D implementation of Kautz structure isshown. In this section, the latency of horizontal links ineach layer are nearly 10 times slower than vertical links.In the case of uniform traffic, the 3D structure has lowerlatency because of fast vertical links in the network.Smaller links consumes less power and so it can affect thetotal power consumption.

One of the main characteristics of 3D technologies is

reduction in power dissipation of links in the network.Fig. 8(a) compares total power that dissipates in links inthe case of 2D and 3D structures. The results show thatwith the aid of vertical links the power dissipation de-creases nearly 40 %. The 3D layout that has been pro-posed in this paper has an important characteristic inthree-dimensional designs. The lower layers in the archi-tecture consume less power because of fewer links and soless traffic in the layers. This feature can reduce the effectof heat dissipation in 3D VLSI designs. Fig. 8(b) showsthepower that dissipates in each four layers of the struc-ture.

(a) (b)

Fig. 6. Average delay and total power comparison between two-dimensional Kautz and mesh structure.






(a) (b)

Fig. 7. Average delay and total power comparison between three and two dimensional implementation of Kautz structure (Vertical links

are 10 times faster than horizontal links).

(a) (b)

Fig. 8. a) Total link power comparison between three and two dimensional implementation of Kautz. b) Layer Power consumption.

Fig. 9. The area overhead of 8x8 and 16x16 digr aph-based mesh NoCs with diff erent buffer dep ths





4.2 Area overhead

The area overhead due to the additional inter-routerwires was analyzed by calculating the number of chan-nels in a NoC. An n×n 2-d mesh has 2n(n-1) channels. The2D Kautz NoCs have the same number of channels al-though some links use longer wires. To evaluate the im-

posed area overhead, we estimated the area of the routersand links using Orion 2.0 tool [26]. The area of the routerwas calculated in 90nm technology.

In the analysis, the lengths of input and output net-work interface buffers were considered as large as 64 flits.This was a modest size for the network interface queues.

In Table 3, the area overhead of a Kautz NoC is eva-luated for 88 networks. The results show that in an 88mesh the total area of the links and routers are 0.0366mm2 and 0.2742 mm2

, respectively. Based on these areaestimations, the area of the network part of the 2-D Kautznetwork shows a 22% increase compared to the 2-D meshNoC of equal size. Considering 2mm×2mm processing

elements, the increase in the entire chip area is about 2%.It is noteworthy that the router area is a function of its

buffers depth. The depth of the input and output buffersof the routers, in this paper, were set to 2 flits (or 4-flitbuffer at each port). A performance analysis in [27], how-ever, showed that a typical router in next generationhigh-performance CMPs (using 22nm technology) willneed 64-flit buffers at each port, which result in a higherrouter/switch area. Fig. 9 plots the area overhead of the2D Kautz NoCs over a mesh with (the same size and buf-fering capacity) for network size N =88 and differentbuffer depths in a system with 32-bit channels. Based on

Fig. 9, a system using 64-flit buffers reduces the areaoverhead of the 2D Kautz networks to about 9%.

Table 1Area Comparison

5 CONCLUSION

The simple 2D mesh topology has been widely used ina variety of interconnection network applications espe-cially for NoC design. However, the Kautz networkhas not been studied yet as the underlying topologyfor 2D tiled NoCs. In this paper, we introduced the

two-dimensional Kautz network which has the samecost as the popular mesh, but has a logarithmic diame-ter. We then conducted a comparative simulationstudy to assess the network latency and power con-sumption of the two networks. Results showed that the2D Kautz topology improves the network latency es-pecially for heavy traffic loads. The power consump-tion in the 2D Kautz network was also less than that ofthe equivalent simple 2D mesh NoC. Furthermore, inthis paper, a 3D VLSI layout for the 2D Kautz is pro-posed and this design has caused to further decreasethepower consumpsion, especially in the NoC links.

Combination of Kautz with other topologies can bea challenging future work in this line.

REFERENCES

[1] C. Grecu, P. P. Pande, A. Ivanov, and R. Saleh, "Timing Analy-

sis of Network on Chip Architectures for MP-Soc Platforms,"

Microelectronics, Vol. 36, pp. 833-845, 2005.

[2] P. P. Pande, C. Grecu, M. Jones, A. Ivanov, and R. Saleh, "Per-

formance Evaluation and Design Trade-Offs for Network-on-

Chip Interconnection Architectures," IEEE Transactions on Com-

puters, Vol. 54, No. 8, August 2005.

[3] T. Bjerregaard and S. mahadevan, "A Survey of Research and

Practices of Network-on-Chip," ACM Computing Surveys, Vol.38, March 2006.

[4] K. Banerjee, S. J. Souri, P. Kapur, and K. C. Saraswat, "3D ICs: A

Novel Chip Design for Improving Deep-Submicrometer Inter-

connect Performance and Systems-on-Chip Integration," Pro-

ceedings of the IEEE, Vol. 89, No. 5, May 2001.

[5] F. Li, C. Nicopoulis, T. Richardson, Y. Xie, V. Krishnan, and M.

Kandemir, "Design and Management of 3D Chip Multiproces-

sors Using Network-in-Memory," International Symposium on

Computer Architecture (ISCA'06), USA, pp. 130-141, 2006.

[6] K. Srivasan, K. S. Chata, and G. Konjevad, “Linear Program-

ming Based Techniques for Synthesis of Networks-on-chip Ar-

chitectures,” IEEE International conference on Computer Design

(ICCD), pp. 422-429, 2004.

[7] U. Y. Ogras and R. Marculescu, “Application-Specific Network-

on-Chip Architecture Customization via Long-Range Link In-

sertion”, in IEEE/ACM Intl. Conf. on Computer Aided Design, San

Jose, 2005.

[8] W. J. Dally, “Express Cubes: Improving the Performance of K-

ary N-cube Interconnection Networks," in IEEE Trans. on Com-

puters, Vol. 40, No. 9, 1991.

[9] R. Sabbaghi-Nadooshan, M. Modarressi, and H. Sarbazi-Azad,

“The 2D digraphed-based NoCs:attractive alternatives to the

2Dmesh NoCs”, Journal of Supercomputing, DOI

10.1007/s11227-010-0410-6.

[10] W. H. Kautz, “The design of optimum interconnection net-

works for multiprocessors,” In Architecture and design of digi-

tal computers. Nato Advanced summer Institute, pp. 249-77,

1969.

NetworkLinkArea(mm2)

RouterArea(mm2)

Addedarea (Per-centage)

Addedarea withrespect tothe wholechip

64-node2D Mesh

0.0366 0.2742 0 0

64-node2DKautz

0.0627 0.2742 22.75 2.08






[11] J. C. Bermond, R. W. Dawas, and F. O. Ergincan, “De Bruijn and

Kautz Bus Networks,” Networks, vol.30, No.3, pp. 205-218,

1997.

[12] R. Sabbaghi-Nadooshan and H. Sarbazi-Azad, “The Kautz

mesh: A New Topology for SoCs”, International SoC Design

Conference, pp 300-303, 2008.

[13] C. Chen, P. Agarwal and J. R. Burke, “dBcube :A New class of

Hierarchical Multiprocessor Interconnection Networks with

Area Efficient Layout,” IEEE Transaction on Parallel and Distri-

buted Systems,Vol.4,No.12,pp.1332-1344,Dec1993.

[14] H. Park and D.P. Agrawal, “A Novel Deadlock-free Routing

Technique for a class of de Bruijn based Networks,” 7th IEEE

Symposium on Parallel & Distributed Processing, pp. 92-97, 1995.

[15] K. Puttaswamy and G. H. Loh, "Implementing Caches in a 3-D

technology for High Performance Processors," IEEE Internation-

al Conference on Computer Design: VLSI in computers and Proces-

sors (ICCD'05), USA, pp. 525-532, 2005.

[16] S. Das, A. Chandrakasan and R. Reif, "Three Dimensional Inte-

grated Circuits: Performance, Design, Methodology and CAD

tools," IEEE Computer Society Annual Symposium on VLSI , USA,pp. 13-18, 2003.

[17] S. J. Souri, K. Banerjee, A. Mehrotra and K. C. Saraswat, "Mul-

tiple Si Layer ICs: Motivation, Performance Analysis and De-

sign Implications," Design Automation Conference (DAC'00),

USA, pp. 213-220, 2000.

[18] R. Zhang, K. Roy, C.K. Koh, and D. B. Janes, "Power Trends and

Performance Characterization of 3-Dimensional Integration for

Future Technology Generation," International Symposium on

Quality Electronic Design, USA, pp. 217-222, 2001.

[19] J. Cong, A. Jagannathan, Y. Ma, G. Reinman, J. Wei and Y.

Zhang, "An Automated Design Flow for 3D Microarchitecture

Evaluation," Asia and South Pacific Conference on Design Automa-tion (ASPDAC'06), pp. 384-389, 2006.

[20] http://www.princeton.edu/~lshang/popnet.html, August

2007.

[21] H. Wang, X. Zhu, L. Peh, and S. Malik, “Orion: A Power-

Performance Simulator for Interconnection Networks,” 35th In-

ternational Symposium on Microarchitecture, pp. 294-305, 2002.

[22] R. Sabbaghi-Nadooshan, M.Modarressi, and H. Sarbazi-Azad,

“2DSEM: A novel high-performance and low-power mesh-

bases topology for networks-on-chip” International Journal of

Parallel ,Emergent and Distributed Systems,Vol.25, No.4, pp.

331-344, August 2010.

[23] R. Sabbaghi-Nadooshan, M. Modarressi, and H. Sarbazi-Azad,

“2D DBM: An Attractive Alternative to the Simple 2D Mesh

Topology for on-chip Networks,” ICCD, pp. 486-491, 2008.

[24] R. Mullins, A. West, and S. Moore, “The Design and Implemen-

tation of a Low-Latency On-Chip Network,” Asia and South Pa-

cific Design Automation Conference, pp. 164-169, 2006.

[25] J. Duato, S. Yalamanchili, and N. Li, Interconnection Net-

works: An Engineering Approach, Morgan Kaufmann Publishers,

2005.

[26] A. B. Kahng, B. Li, L.-S. Peh, and K. Samadi, “Orion2.0: A Fast

and Accurate NoC Power and Area Model for Early-Stage De-

sign Space Exploration,” DATE, pp. 423-428, 2009.

[27] J. Duato, et al, “A High Performance Router Architecture for

Interconnection Networks," in Proc. Int. Conf. Parallel Processing,

pp.61-68, 1996.

R. Sabbaghi-Nadooshan received the B.S. and M.S. degree inelectrical engineering from the Science and Technology University,

Tehran, Iran, in 1991 and 1994 and the Ph.D. degree in electricalengineering from the Science and Research Branch, Islamic AzadUniversity, Tehran, Iran in 2010. From 1998 he became facultymember of Department of Electronics in Central Tehran branch,Islamic Azad University, Tehran, Iran. His research interests includeinterconnection networks, Networks-on-Chips, and embedded sys-tems.

http://www.princeton.edu/~lshang/popnet.html



Documents

Kautz Mesh Topology for on-Chip Networks