Mapping of irregular IP onto NoC architecture with optimal energy consumption

TSINGHUA SCIENCE AND TECHNOLOGY ISSN 1007-0214 26/49 pp146-149 Volume 12, Number S1, July 2007

Mapping of Irregular IP onto NoC Architecture with Optimal Energy Consumption*

LI Guangshun (李光顺)1,2,**, WU Junhua (吴俊华)1,2, MA Guangsheng (马光胜)1

1. College of Computer Science and Technology, Harbin Engineering University, Harbin 150001, China; 2. College of Computer Science and Technology, Qufu Normal University, Rizhao 150001, China

Abstract: Network on chip (NoC) architectures have been proposed to resolve complex on-chip communica-

tion problems. An NoC-based mapping algorithm is shown in this paper. It can map irregular intellectual

properties (IPs) cores onto regular tile 2-D mesh NoC architectures. The basic idea is to decompose a large

IP into several dummy IPs or integrate several small IPs into one dummy IP, such that each dummy IP can

fit into a single tile. It can also allocate buffer space according to the input/output degree and avoid connec-

tion congestion by adapting communication density. Experimental data indicate that using the algorithm pro-

posed in this paper, the communication energy can be reduced about 7%.

Key words: network on chip (NoC); communication matrix; router weight; communication density

Introduction

Regular NoC architectures have been proposed re-cently as a promising solution to the increasingly com-plex on-chip communication problems. Such NoC ar-chitecture consists of an array of regular tiles where each tile can be a general-purpose processor, a DSP, or a memory subsystem etc. A router is typically embed-ded within each tile and thus, instead of routing de-sign-specific global on-chip wires, the inter-tile com-munication can be achieved by routing packets[1,2]. With the increase of IP integration and component spe-cialization, the heterogeneity of these designs inevita-bly increases. This heterogeneity helps not only in op-timizing the overall performance for low-power con-sumption, but also in ensuring competitive design costs[3,4].

Daniel et al.[5] proposed an IP-centric embedded sys-tem design methodology. The major challenges in the

IP centric methodologies are the interface synthesis among various IP blocks and system verification. Re-cently, platform-based-design methodology[6] has been proposed which not only allows reuse of components but also reuse of system architectures and topologies. Along the same times, Ahmed et al.[7] presented a hon-eycomb structure in which each processing core is lo-cated on a regular hexagonal node connected to three switches. Many architectural templates have been pro-posed for hardware platforms for future system on a chip (SoC). There is a general emphasis on providing efficient and standardized communication infrastruc-ture for connecting multiple resources on the chip[1]. It has also been realized that the key to reuse and integra-tion of IP components is the communication from the physical to the system and conceptual level, and con-sequently communication centric architectures, plat-forms and methodologies have been developed[8]. In Ref. [9], Kumar et al. described an NoC architecture implemented by a two-dimensional (2-D) mesh of switches and resources. These papers discuss the over-all advantages and challenges of the regular NoC ar-chitecture.

However, regular NoC architecture may suffer from

﹡

﹡﹡

Received: 2007-02-01 Supported by the National Natural Science Foundation of China (No. 60273081) To whom correspondence should be addressed. E-mail: [email protected]; Tel: 13091448766

LI Guangshun (李光顺) et al：Mapping of Irregular IP onto NoC Architecture with …

147

serious die-area waste when the comprised IPs of the application under design varies significantly in terms of individual sizes[10]. In addition, the tile size has to fit the largest IP, thus increasing the link length between neighbor on-chip routers. This may also lead to signifi-cant area overhead. In order to achieve the best per-formance/cost tradeoff, the designer needs to select the right NoC platform (e.g., the platform with the right size of tiles, routing strategies, buffer sizes, etc.) and customize it according to the characteristics of the ap-plication under design[11]. Furthermore, the same or different variants of the same application have to be mapped onto different hardwards of the product. If this can be done quickly and cost effectively, many product versions for various market niches can be supported[12].

To solve those issues, an irregular IP-based mapping algorithm is proposed in this paper. Compared with other mapping algorithm, an irregular IP (block) may occupy an area many times larger than regular tile and has a different internal topology and communication mechanism. The algorithm can handle applications with irregular-sized IPs. The basic idea to deal with ir-regular IPs which occupies two or more regular tiles is by decomposing the IP into several smaller dummy IPs, so that each dummy IP can fit into a single tile. For ir-regular IPs which are ultra small, we integrate a few of them into one dummy IP.

1 Irregular IP Core Mapping Algo-rithm

If an SoC is composed of n IP cores, according to the communication relation of all the IP cores, a commu-nication matrix T=tij (0≤i, j≤n−1) can be gained, as shown in Eq. (1). tij =1 if i≠j and there are communi-cations between i-th IP core and j-th IP core; otherwise, tij =0.

It is easy to see that the elements of diagonal line are all 0, which means one IP core communicate with itself without traffic.

01 02 0

10 12 1

20 21 2

1,0 1,1 1,2

0 ...0 ...

0 ...=... ... ... ... ...

... 0

n

n

n

n n n

t t tt t t

t t t

t t t− − −

⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠

T (1)

In previous research, most scholars suppose all tile size is equal and every tile size is large enough to hold

the largest IP core. This means the tile size must be lar-ger than the maximal IP core, which will waste the chip area and consumes more communication energy greatly. Here we propose a method to map irregular IP (extra large IP or extra tiny IP) core to NoC architec-ture with as low communication energy as possible. The method is shown as following.

Step 1 Compute the average size (area) of all IP cores. Add all IP core area used in the system and di-vide it by n, and the average area Savg is gained. It can be known from vast factual design experiments that the IP core size near the average size occupies mostly, the extra large IP core or tiny IP core (call them irregular IP) are less compared with the average tile size. So we take α Savg as the regular tile size (coefficient α can vary according to the factual design, satisfying that most IPs size are close but smaller than α Savg, here we assume the coefficient α is 1.2).

Step 2 Generate initial mapping solution. If the size of an IP core is larger than the regular tile size, then “decompose” the IP into as less dummy IPs as possible, so that every dummy IP size is smaller than the regular tile size (as shown in Fig. 1, one big IP can be decomposed into four black dummy IPs). If the IP core size is much smaller than the regular tile size, then integrate several such IPs to one dummy IP, whose size is close to but smaller than the regular tile size.

Fig. 1 NoC architecture of 4×4 tile mesh systems

A 4×4 NoC architecture is shown in Fig. 1. Module identified by “R” represents a router, which is used to route and buffer messages between IPs. Module identified by “IP” represents a physical IP or a dummy IP, which is a compute unit or store unit. Each tile can

Tsinghua Science and Technology, July 2007, 12(S1): 146-149

148

map to a physical IP or a dummy IP, and there is one channel at most to communicate with adjacent router. Each router is connected to four neighboring switches through input and output channels. A channel C con-sists of two one-directional point-to-point buses be-tween two routers or between an IP and a router. Every router consists of one internal queue to handle conges-tion. The internal architecture of a router is shown in Fig. 2.

Fig. 2 Router internal architecture

Step 3 Compute the router weight. If one large IP core is decomposed into m (m≥2) dummy IPs (means the irregular IP core is mapped to m tiles), then the

router weight is set to 2m⎡ ⎤⎢ ⎥⎢ ⎥

; else the router weight is

set to 1. For example, the black IP core in Fig. 1 is de-composed into 4 dummy IPs, so its corresponding router weight is set to 2. If there are communications between arbitrary two IP cores, then the weight be-tween the two IP cores is defined as the sum of all the router weights of the communication paths pass through. All the weights between arbitrary two IP cores can structure a weight matrix W= wij(0≤i, j≤n－1).

Step 4 Compute input/output degree. For k-th (0≤k≤n－1) IP core, if it needs to transmit packets to Ok other IP cores, then its output degree is Ok . Similarly, if there are Ik other IP cores need to transmit packets to the IP core, then its input degree is Ik . The input/output degree of k-th IP core is

IO = + k k kD O I (2) When the total buffer space is given, we can allocate

them to routers according to their degree. If the degree of one IP core is large, it indicates that the IP core communicates with others frequently, so it is allocated with more buffers space accordingly; otherwise, the IP core communicates with others infrequently and is al-located with less buffer space.

Step 5 Compute the communication density. We can compute the communication density of every IP core using the degree computed in Step 4 and weight matrix W computed in Step 3. For k-th (0≤k≤n－1) IP core, its communication density is equal to the sum of input/output weights, namely,

0 0

Den k kO I

k ki iki i

w w= =

= +∑ ∑ (3)

If Denk is large, it implies the fact that the k-th (0≤k≤n－1) IP core will communicate with more other IP cores, and the interconnect wire will be denser. So it is congestion apt and this should be avoided. We can ex-change it with an IP core whose Denk is small. Jump to Step 4 and continue, until all the IP core densities are even and the sum of the weight matrix is minimal (or near minimal).

2 Experimental Results

We simulated the core graphs of six video processing applications: MPEG4 decoder (mapped onto 14 cores), video object plane decoder (OPD, mapped to16 cores), picture-in-picture application (PIP, mapped to cores), multi-Window application (MWA, mapped to 14 cores), MWA with graphics (MWAG, mapped to 16 cores), and dual screen display (DSD, mapped to 16 cores). The last four benchmarks are high-end video applications. We also implemented the partial branch-and-bound al-gorithm (PBB) presented in Ref. [10] for comparison. Figure 3 shows the minimum communication cost for

Fig. 3 Communication energy comparison of PBB

and our method

LI Guangshun (李光顺) et al：Mapping of Irregular IP onto NoC Architecture with …

149

the applications with the same bandwidth constraints. It can be seen from Fig. 3 that the total communication energy of our method is about 7% less than PBB algo-rithm.

3 Conclusions

A new mapping algorithm is proposed in this paper. It can map irregular IP cores onto regular tile 2-D mesh NoC architecture. The basic idea is decomposing every large IP into several smaller dummy IPs, so that each dummy IP can fit into a single tile. Using communica-tion density computed in Section 2, it can avoid con-nection congestion and reduce communication energy.

References

[1] William J D, Brian T. Route packets, not wires: On-chip interconnection networks. In: Proc. of the 38th DAC. Las Vegas, USA, 2001: 684-689.

[2] Kumar S, Jantsch A, Soininen J P, et al. A network on chip architecture and design methodology. In: Proc. VLSI. Las Vegas, USA, 2002: 105-112.

[3] Benini L, De M G. Networks on chips: A new SoC para-digm. IEEE Computer, 2002, 35(1): 70-78.

[4] Liang J, Swaminathan S, Tesssier R. ASoC: A scalable, singlechip communications architecture. In: Proc. PACT. Philadelphia, PA, 2000.

[5] Daniel D G, Rainer D, Jian W Z. IP-centric methodology and design with the specC language. In: Proceedings of the NATO ASI on System Level Sythesis for Electronic De-sign. Il Ciocco, Lucca, Italy, 1998.

[6] Vahid F, Givargis T. Platform tuning for embedded sys-tems design. IEEE Computer, 2001, 34(3): 112-114.

[7] Ahmed H, Axel J, Shashi K, et al. L Network on a chip: An architecture for billion transistor era. In: Proceedings of NorChip. Turku, Finland. 2000: 166-173.

[8] Drew W. Micro network-based integration of SoCs. In: Proc. of the 38th DAC. Las Vegas, USA, 2001: 673-677.

[9] Kumar S, Jantsch A, Soininen J P, et al. A network on chip architecture and design methodology. In: Proc. Symp. on VLSI. Las Vegas, USA, 2002: 105-112.

[10] Jing C H, Radu M. Energy- and performance-aware map-ping for regular NoC architectures. IEEE Trans. on CAD of IC and Systems, 2005, 24(4): 551-562.

[11] Varatkar G, Marculescu R. On-chip traffic modeling and synthesis for MPEG-2 video applications. IEEE Trans. VLSI Sys, 2004, 12(1): 108-119.

[12] Jing C H, Radu M. Energy-aware communication and task scheduling for network-on-chip architectures under real-time constraints. In: Proc. DATE. Paris, France, 2004: 234-239.

Documents

Mapping of irregular IP onto NoC architecture with optimal energy consumption