Mesh-connected rings topology for network-on-chip

October 2013, 20(5): 30–36 www.sciencedirect.com/science/journal/10058885 http://jcupt.xsw.bupt.cn

The Journal of China Universities of Posts and Telecommunications

Mesh-connected rings topology for network-on-chip LIU You-yao (�), GAO Meng

School of Electronic Engineering, Xi’an University of Posts & Telecommunications, Xi’an 710121, China

Abstract

With the feature size of semiconductor technology reducing and intellectual property (IP) cores increasing, on-chip interconnection network architectures have a great influence on the performance and area of system-on-chip (SoC) design. Focusing on trade-off performance, cost and implementation, a regular network-on-chip (NoC) architecture which is mesh-connected rings (MCR) interconnection network is proposed. The topology of MCR is simple, planar and scalable in architecture, which combines mesh with ring. A detailed theoretical analysis for MCR and mesh is given, and a simulation analysis based on the virtual channel router with wormhole switching is also presented. The results compared with the general mesh architecture show that MCR has better performance, especially in local traffics and low loads, and lower cost.

Keywords SOC, NOC, network topology, routing algorithms, performance evaluation

1 Introduction

The ability of the NoC to efficiently disseminate information depends largely on the underlying topology. Aside from the paramount effect on the network latency, throughput, area, fault-tolerance and power consumption, the topology plays an important role in designing the routing strategy and mapping the cores to the network nodes [1–3]. Consequently, determining the optimal topology architecture is one of the key problems in NoC design. In NoC, some basic regular network topologies have been proposed, such as two-dimensional (2D) mesh, 2D torus, folded 2D torus, fat-tree, chordal ring, hierachical ring, thin, DL(2m), Rgrid, XMesh etc [1,4–6]. 2D mesh and 2D torus are the most studied topologies (over 60% of the cases) [1].

NoC requires the scalability and reusability network architecture [7]. For two-dimensional planar interconnection network, the adjacency property of the topology between the nodes must be space adjacent. This character can greatly facilitate the layout of the chip [8]. The router is the main part of the NoC, which has influence on the design and verification of NoC, as well as on the area and

Received date: 18-04-2013 Corresponding author: LIU You-yao, E-mail: [email protected] DOI: 10.1016/S1005-8885(13)60086-2

power consumption. Its complexity is mainly determined by cache size, the number of input and output port, the number of virtual channel, switching complexity and routing algorithm [8–9]. The complexity of the arbitration logic of router, the area of crossbar, delay, power consumption are all directly related to the number of input and output ports. Therefore, the number of input and output ports (network node connectivity) is an important indicator in measuring router complexity [8–9]. However, the planar mesh (torus is non-planar) topology’s node degree is 4, which makes the NoC router nodes have a higher design cost. Therefore, based on the balance of performance and cost, this paper has proposed an improved mesh topology, that is, the MCR topology. MCR interconnection network node has a connectivity of three and it also has the same network diameter with the mesh architecture, which makes it become a simple topology with regularity, planarity, scalability and lower cost. Besides, using a layered coding method and the shortest path routing can make the routing algorithm simple.

2 MCR interconnection network

2.1 Topology of the MCR

Definition 1 2D mesh topology architecture graph

Issue 5 LIU You-yao, et al. / Mesh-connected rings topology for network-on-chip 31

g (V, E) has the following properties: 1) g (V, E) is a simple connected and undirected graph; 2) V ={(x, y) | 0 � x � n − 1, 0 � y � m − 1, n, m, x, y� I}; 3) E = { ( x1, y1) ,(x2, y2 ) | (0� y1= y2�m − 1, | x1 − x2 | = 1) U (0�x1 = x2�n − 1, |y1 − y2 | =1)}.

Definition 2 Ring topology architecture graph g (V, E) has the following properties: 1) g (V, E) is a simple connected and undirected graph; 2�V ={ x|0�x�n − 1, n, x�I }�3) E = { 1 2,x x | |x1 − x2| =1 (mod n) }.

Definition 3 MCR topology architecture graph g (V, E) has the following properties: 1) g (V, E) is a simple connected and undirected graph; 2) V ={(x,y,z) | 0�x�n − 1, 0�y�m − 1, 0�z�3, n, m, x, y, z �I }; 3) E={ 1( ,x y1, z1), (x2, y2, 2 )z |(|x1 − x2|=1, 0�y1=y2� m − 1, z1�z2 = 2 ) U ( 0�x1=x2�n − 1, |y1 − y2| =1, z1�z2 = 2) U ( 0�x1=x2�n − 1, 0�y1= y2�m − 1, |z1 − z2| =1 (mod 4)) }.

MCR is a two-tier scalable interconnection network. The first layer of the structure contains four nodes which are connected into a ring topology. It can be seen as a super node, and can enhance the local characteristics of the system, improve the performance of the system. The second layer is to connect the super-nodes into 2D mesh topology. Fig. 1 gives a MCR topology with 64-node and chip layout. Small circle in Fig. 1(a) indicates the NoC router nodes, small grid indicates computing resources. The small square of Fig. 1(b) indicates the route nodes with computing resources, four small squares to form a large square that is ring. The node address in Fig. 1 is presented in the layered method. The number in parentheses represents the ring address. The number of circle or square represents the relative address of each node in the ring. For example, (3,2,0) represents the node from top to bottom, from left to right in Fig. 1(a), that is the 0th node in ring at 2nd row and 3rd column.

(a) Topology architecture and channel identification of router node

(b) Chip layout

Fig. 1 MCR topology architecture and chip layout

2.2 Routing algorithm of MCR

Routing algorithm is a key factor that can affect the efficiency of the communication of NoC. This paper has provided a distributed deterministic routing algorithm (DDRA) for MCR network. The characteristic of MCR is fully utilized in the routing. In this approach, each node, upon receiving the packet, decides whether the packet should be delivered to the local node or forwarded to adjacent node. During the routing decision process, the routing algorithm dose not need the state information of the complete network, but just uses code of the current and destination node, thus it can reduce the network communication overhead and node storage overhead.

A source node’s processing element (PE) S (xcurrent, ycurrent, zcurrent) sends data packet to a destination node’s PE D (xdest, ydest, zdest). In this process, x represents the abscissa of PE core in the mesh, y represents the vertical coordinates of PE core in the mesh, and z represents the address of PE in the ring. The address offset of the destination node and the current node, xoffset= xdest − xcurrent, yoffset = ydest − ycurrent, zoffset= zdest�zcurrent.

The XY routing algorithm can be adopted in the upper mesh structure, determined, and the shortest path routing algorithm can be applied in the ring. In order to avoid producing deadlock, the routing process of packet in network should be: 1) if xoffset < 0, then the current node in the right side of the destination node, select the same line on the left link or ring structure upper half of the links of mesh as the output channels; 2) if xoffset = 0 and yoffset<0,

32 The Journal of China Universities of Posts and Telecommunications 2013

then the upper edge of the current node in the destination node, select he same column below link or the right half of the links in the ring structure of the mesh as the output channels; 3) if xoffset =0 and yoffset >0, then below the current node in the destination node, select the same column above link or the left half of the links in the ring structure of the mesh as the output channels; 4) if xoffset = 0 and yoffset = 0, that is, the current node and destination node in same ring. If two nodes are adjacent in same ring, then sent directly to the destination node, otherwise, if zcurrent is 1 or 3, then routing is in a counter-clockwise, if zcurrent is 0 or 2, then the routing is in a clockwise direction.

The pseudo code of the routing algorithm as follows:

Routing () { Input: the current node address (xcurrent, ycurrent, zcurrent) and the destination node address (xdest, ydest, zdest) Output: the selected output channel

Routing process:

/* Calculate the offset of the current node and destination node

address * / xoffset= xdest − xcurrent�

yoffset = ydest − ycurrent�

zoffset= zdest�zcurrent�

if xoffset< 0 then

/ * The current node in the right side of the destination node * /

if zcurrent = 0 then

rchannel: = M;

else if zcurrent = 3 then

rchannel: = C1;

else

rchannel: = C0;

endif

endif

if xoffset> 0 then

/ * The current node in the left side of the destination node * /

if zcurrent = 2 then

rchannel: = M;

else if zcurrent = 1 then

rchannel: = C1;

else

rchannel: = C0;

endif

endif

if yoffset <0 then

/ * The current node in the destination node above * /

if zcurrent=3 then

rchannel:=M;

else if zcurrent=0 then

rchannel:=C0;

else

rchannel:=C1;

endif

endif

if yoffset> 0 then

/ * The current node in the bottom of the destination node * /

if zcurrent=1 then

rchannel:=M;

else if zcurrent=2 then

rchannel:=C0;

else

rchannel:=C1;

endif

endif

if xoffset= 0 and yoffset= 0 then

/ * The current node and destination node in the same ring

structure * /

if zoffset=0 then

rchannel:=L;

if zoffset=1then

if zcurrent= 0 or zcurrent= 2

rchannel:=C1;

else

rchannel:=C0;

endif

endif

if zoffset=2 then

if zcurrent= 1 or zcurrent= 3 then

rchannel:=C1;

else

rchannel:=C0;

endif

endif

if zoffset=3 then

if zcurrent= 0 or zcurrent= 2 then

rchannel:= C1;

else


rchannel:= C0;

endif

endif

endif }

2.3 The properties of the MCR interconnection network

This paper compares the typical two-dimensional plane topology mesh with the MCR topology. MCR’s average ideal delay is slightly better than mesh, and average throughput of mesh is slightly better than MCR. But MCR network is a hierarchical network, it has a good locality, which can improve the performance of the system. Besides, MCR node degree is small, and can make the router hardware design more simple and effective than mesh. MCR interconnection network has good properties (number of nodes N = 24n ) as follows:

Property 1 MCR network is a regular topology with a node connectivity degree of 3, which makes NoC’s router node design become less complex. As long as the design of two kinds of input and output ports , the number of router nodes can constitute a rule the NoC structure, which can be a modular design and router nodes reusable design.

Property 2 MCR network link is 26n − 2n. Number of the mesh structural links that connect each ring is

22n − 2n, each ring link is 4. There are a total of 2n ring. Therefore, the total number of links is 2 24 2n n+ − 2n= 26n − 2n. The low link number of MCR makes the network cost lower.

Property 3 MCR interconnection network has a good scalability. Scalability refers to the ability to expand the node when the performance of the network topology is held constant. In the other word, effective use of a reflection to the increased processing resource capacity and it can affect the efficiency of network routing. For an n n× MCR interconnection network, add a line and a ring structure can be extended (n +1)×(n +1) MCR interconnection network. There is no change between other nodes and the connection beside the newly added nodes.

Property 4 Maximum distance between any two nodes in the MCR network (network diameter) is 6n − 2 (For each 4-node ring diameter is 2, the diameter of mesh is 2n − 2, MCR’s network diameter is the sum of the mesh diameter and the product of the mesh diameter and ring diameter). The network diameter affects maximum network latency.

Property 5 The average distance of the MCR network

is 4n/3(because the average distance of each 4-node ring is 4/4=1, the average distance of an n×n mesh structure is 2n/3 diameter, MCR’s network average distance is the sum of average distance of the mesh and average distance of the ring, that is, 2n/3+2n/(3×1). Network average distance is proportional to ideal average latency [10].

Property 6 MCR network is planar. Planarity means whether the network topology can be achieved on a flat surface. The nature of the network topology greatly facilitates the layout of the chip and it should be considered in the design. In high-dimensional (greater than 2D) interconnection network, topological adjacency does not lead to spatial adjacency. However, two-dimensional interconnection network’s adjacent nodes in topology must be adjacent in space [8, 10].

Property 7 MCR network’s sub-width is n. The sub-width is an indicator of the ideal throughput of the network. The ideal throughput means that for a given topological structure, the number of the max network throughput under the ideal flow control and routing mechanism [10].

To further illustrate the excellent features of the MCR network, Table 1 shows the comparison between the MCR interconnection network and 2D mesh interconnection network, the number of nodes in the table N = 24n . It can be obtained from Table 1 that when network diameter, average distance and number of nodes are in the same situation, MCR network topology has lower node connectivity and link number, that makes lower network costs than a mesh network topology. Ideal throughput formula[10]: Nidealtp�2bBc/N, Bc is the number of channels that needs to divide the whole network into two equal pieces, b is the width of each channel of data, N is the total number of nodes. For N= 24n node network topology, the Bc= n of MCR, mesh’s Bc=2n. Because NoC’s line resources is profuse, when each MCR data channel width is twice than mesh, the MCR and mesh still have the same ideal throughput.

Table 1 Performance characteristic of MCR and mesh

Topology Node degree

Number of links Diameter Average

distance MCR 3 26n − 2n 6n − 6 4n/3

Mesh 4 28n − 4n 4n − 2 4n/3

3 Performance and cost evaluation

Performance and cost are important indicators to evaluate NoC topology architecture [11]. In order to


compare different NoC topology architecture, a standard set of performance and cost evaluation measurement can be used.

3.1 Evaluation method

In order to assess the given MCR interconnection network performance and the cost, we developed a virtual channel router which adopts wormhole switching technology, and constructed the MCR and mesh interconnection network. Fig. 2 shows the block diagram of the router with virtual channel [12]. The channel width of the router is a flit (flow control unit, 16-bit). Each input physical channel of the router is divided into four virtual channels, and each virtual channel has a buffer of 8 flits depth. Each output channel has a flit buffer. It uses a distributed deterministic routing algorithm (mesh network with XY routing algorithm, MCR network using DDRA routing algorithm in Sect. 2.2). The status information of the virtual channel is used as flow control information. It uses the polling ways to allocate virtual channel, and crossbar is fully connected. The clock frequency is 200 MHz. Router input and output channel number is determined by the network topology architecture. And then use the method of simulation and physical implementation to analyze the MCR and Mesh interconnection network performance and cost.

Fig. 2 Block diagram of virtual channel router

3.1 Simulation results

In order to evaluate the MCR network’s average communication delay, average throughput and link utilization, we simulate and analyze the MCR and mesh

topology with 64 nodes. And build uniformly distributed traffic model by SystemC. Each packet is 5 flits, 1 head flit, 3 data flits and a tail flit, and each flit is two bytes. Every two clock cycles can produce a flit, generated packet rate (the interval between the two data), that is the flowing-in network load and the destination node, can be controlled. Figs. 4–6 shows the performance curve in the case rate for the uniform distribution of flow into the network at each node, destination nodes are uniformly distributed and local distribution.

The probability of local distribution is 0.7, the data of 70% generated by each node sent to adjacent nodes in the network, and the remaining the data of 30% send evenly throughout the network. Use the number of flit to measure the throughput per node per clock cycle to transfer, that is

Nthroughput=1

N

ii

T N=

∑ / nN ; network latency can be defined

as the total time from the first flit of the message flowing into the network to the last flit of the message received by target node. Due to network congestion, network latency of each message is different. Therefore, we need to use the average network latency to measure the performance of on-chip network, that is

Llatency=1

N

ii

L=

∑ / N . Link utilization refers to the number of

flit transmission each link in unit time, that is

Lutilization=1

N

i ii

T N H=

×∑ / C , where Ni is the number of flit of

ith packet, N is the number of packets received within the simulation time, Nn is the number of nodes, T is the total number of simulation cycle, Hi is the path length of the ith packet experiences, C is the number of physical links in the network.

Fig. 3 shows that under light load conditions, MCR and mesh networks have similar throughput, but when the load increases and the destination node distributed uniformly, MCR network is more congest than mesh network, so its throughput is lower than mesh network. But under the same node distribution, both throughput has increased and narrowed the gap. Fig. 4 shows that regardless of the kind of distribution of the destination node is, MCR’s average delay is smaller than mesh. At light loads, it is because the MCR network diameter is smaller than mesh, in the heavy load, it is because that the MCR received lesser packets than mesh, and the average delay has a significant reduction in the case of local distribution in the destination node. Fig. 5 shows that at light loads, the MCR link is


more efficiency than mesh, at heavy load, mesh link is more efficiency than the MCR. The main reason is that the two types of network at light load receive similar number of packets, but MCR’s physical link is less than mesh, when heavy load MCR receives fewer packets than mesh. Experimental analysis shows that MCR perform better than the mesh structure in the light load or partial communication, but the mesh perform better than the MCR structure in the heavy load or uniform global communication.

(a) Uniform distribution

(b) Local distribution

Fig. 3 Comparison of average throughput of interconnection network



Fig. 4 Comparison of average latency of interconnection network



Fig. 5 Comparison of the link utilization of interconnection network

3.3 Physical implementation

In order to further validate MCR network physical implementation costs, the design of 36 nodes MCR network and mesh network is synthesized on Altera STRATIX IV GX EP4SGX230DF29C2X field programmable gate array (FPGA) device using Altera Quartus II. The EP4SGX230DF29C2X FPGA device has 184 696


registers, 91 200 estimated packed adaptive logic modules (ALMs), 91200 memory adaptive look-up tables (ALUTs). The EP4SGX230DF29C2X FPGA device has enhanced memory arrays called embedded system block�ESB�. Each ESB can be configured in one of two modes: it can act as a conventional memory, which can be used for storage or logic, or it can act as a programmable array logic block for implementing functions as a sum of products. Each ESB equates to 1 024 bits of random access memory (RAM). (http://www.altera. com). Their resource utilization results are shown in Table 2. Table 2 shows that the resources of MCR network is less than mesh network, which indicates MCR interconnection network costs less than the cost of the mesh interconnection network.

Table 2 Resource utilization of MCR and mesh Resources Mesh (Used/Total) MCR (Used/Total) Registers 46 282/184 696 (25%) 39 194/184 696 (21%)

Estimated packed ALMs 71 273/91 200 (78%) 58 402/91 200 (64%) Memory ALUTs 9 984/91 200 (10%) 8 448/91 200 (9%)

ESB 79 872 bits 67 584 bits

4 Conclusions

This paper presents a new on-chip topology architecture MCR interconnection network. This topology is flat, regularity and good scalability, in addition, it has a lower network costs. MCR interconnection network uses a deterministic shortest path routing. After the analysis and simulation of the MCR and mesh networks, the average communication delay, and the average throughput and link utilization, we found MCR has a good network performance in light load or partial communication, and FPGA implementation shows that the MCR network cost is low. MCR interconnection network has a better trade-off between network performance and cost, and it is a simple and efficient on-chip interconnection network.

Acknowledgements

This work was supported by the National Science Foundation of China (61136002, 61272120), the Key Project of Chinese Ministry of Education (211180), the Natural Science Foundation of Shaanxi Province of China (2010JQ8014), and the Scientific Research Program Funded of Shaanxi Provincial Education Department (2010JK826 ).

References

1. Wang W, Qiao L, Tang Z Z. Survey on the networks-on-chip interconnection topologies. Computer Science, 2011, 38(10): 1−5, 12 (in Chinese)

2. Bjerregaard T, Mahadevan S. A survey of research and practices of network-on-chip. ACM Computing Surveys, 2006, 38(1): 1−54

3. Marculescu R, Ogras U Y, Peh L S, et al. Outstanding research problems in NoC design: system, microarchitecture, and circuit perspectives. IEEE Transactions on Computer-aided Design of Integrated and Systems, 2009, 28(1): 3−21

4. Wang W, Qiao L, Yang G W, et al. A kind of hierarchical ring interconnection networks-on-chip. Chinese Journal of Computer, 2010, 33(2): 326−334 (in Chinese)

5. Liu Y Y, Han J G, Du H M. DL(2m): a new scalable interconnection network for system-on-chip. Journal of Computers, 2009, 4(3): 201−207

6. Zhu X J. A recursive scalable topology for network on chip. Chinese Journal of Computer, 2011, 34(5): 924−930 (in Chinese)

7. Dally W, Towles B. Route packets, not wires: on-chip interconnection networks. Proceedings of the 38th Design Automation Conference (DAC’01), Jun 18−22, 2001, Las Vegas, NV, USA. Piscataway, NJ, USA: IEEE, 2001: 684−689

8. Sibai F N. A two-dimensional low-diameter scalable on-chip network for interconnecting thousands of cores. IEEE Transactions on Parallel and Distributed Systems, 2012, 23(2): 193−201

9. Salminen E, Kulmala A, Hamalainen T D. On network-on-chip comparison. Proceedings of the 10th Euromicro Conference on Digital System Design: architectures, methods and tools (DSD’07), Aug 29−31, 2007, Lubeck, Germany. Piscataway, NJ, USA: IEEE, 2007: 503−510

10. Duato J, Yalamanchili S, Ni L. Interconnection networks: an engineering approach. San Francisco, CA, USA: Morgan Kaufmann, 2003

11. Pande P P, Grecu C, Jones M, et al. Performance evaluation and design trade-offs for network-on-chip interconnect architectures. IEEE Transactions on Computers, 2005, 54(8): 1025−1040

12. Mullins R, West A, Moore S. Low-latency virtual channel routers for on-chip networks. Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA’04), Jun 19−23, 2004, Munich, Germany. Piscataway, NJ, USA : IEEE, 2004: 188−197

(Editor: ZHANG Ying)

Documents

Mesh-connected rings topology for network-on-chip