37
SPATIAL PARALLELISM IN THE ROUTERS OF ASYNCHRONOUS ON-CHIP NETWORKS A THESIS SUBMITTED TO THE UNIVERSITY OF MANCHESTER FOR THE DEGREE OF DOCTOR OF P HILOSOPHY IN THE FACULTY OF ENGINEERING AND P HYSICAL S CIENCES 2010 By Wei Song School of Computer Science

Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

SPATIAL PARALLELISM INTHE ROUTERS OFASYNCHRONOUS

ON-CHIP NETWORKS

A THESIS SUBMITTED TO THE UNIVERSITY OF MANCHESTER

FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

IN THE FACULTY OF ENGINEERING AND PHYSICAL SCIENCES

2010

ByWei Song

School of Computer Science

Page 2: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks
Page 3: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

Contents

Abstract 9

Declaration 10

Copyright 11

Acknowledgements 12

The Author 13

I Introduction and Background 14

1 Introduction 15

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

1.2 Research objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.3 Research contributions . . . . . . . . . . . . . . . . . . . . . . . . . 18

1.4 Thesis organization . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

1.5 Publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2 Asynchronous Circuits 22

2.1 Synchronous and asynchronous circuits . . . . . . . . . . . . . . . . 22

2.2 Delay assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.3 Handshake protocols . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.4 Data encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.5 Basic elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3

Page 4: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

3 Network-on-Chip 243.1 Architecture of on-chip networks . . . . . . . . . . . . . . . . . . . . 24

3.2 Globally asynchronous locally synchronous networks . . . . . . . . . 24

3.3 Previous GALS NoCs . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3.1 Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3.2 ASPIN . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.3.3 QoS NoC . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3.4 MANGO . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3.5 ANoC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3.6 QNoC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

II Levels of Parallelism 26

4 Parallelism in the Physical Layer 274.1 Synchronization overhead . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2 Channel slicing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.3 Lookahead pipeline style . . . . . . . . . . . . . . . . . . . . . . . . 27

4.4 A channel sliced wormhole router . . . . . . . . . . . . . . . . . . . 27

4.4.1 Router structure . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.4.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

5 Parallelism in the Switching Layer 295.1 Problems of timing division and virtual channel . . . . . . . . . . . . 29

5.2 Spatial division multiplexing . . . . . . . . . . . . . . . . . . . . . . 29

5.3 An SDM router . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.3.1 Router structure . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.3.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 29

5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

6 Area Reduction using Clos 306.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

6.2 Clos switching networks . . . . . . . . . . . . . . . . . . . . . . . . 30

6.3 Dynamically reconfiguration . . . . . . . . . . . . . . . . . . . . . . 31

6.3.1 Dispatching algorithms . . . . . . . . . . . . . . . . . . . . . 31

4

Page 5: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

6.3.2 Concurrent round-robin dispatching algorithm . . . . . . . . 316.3.3 Asynchronous dispatching algorithm . . . . . . . . . . . . . 31

6.4 Asynchronous Clos scheduler . . . . . . . . . . . . . . . . . . . . . . 316.4.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . 316.4.2 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 31

6.5 2-stage Clos swtich . . . . . . . . . . . . . . . . . . . . . . . . . . . 316.6 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

III Performance Evaluation and Conclusion 32

7 Asynchronous SDM Router 337.1 Router structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337.2 Utilizing the 2-stage Clos switch . . . . . . . . . . . . . . . . . . . . 337.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

8 Performance Evaluation 348.1 Reproduction of QoS NoC . . . . . . . . . . . . . . . . . . . . . . . 348.2 Single router performance . . . . . . . . . . . . . . . . . . . . . . . 34

8.2.1 Area . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348.2.2 Speed . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 348.2.3 Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

8.3 Network performance . . . . . . . . . . . . . . . . . . . . . . . . . . 348.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

9 Conclusions and Future Work 359.1 Summary of the thesis . . . . . . . . . . . . . . . . . . . . . . . . . 35

9.1.1 Channel slicing . . . . . . . . . . . . . . . . . . . . . . . . . 359.1.2 SDM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359.1.3 Clos . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

9.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

Bibliography 36

5

Page 6: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

List of Tables

6

Page 7: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

List of Figures

7

Page 8: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

List of Abbreviations

CMP chip multi-processor, page 15MPSoC multi-processor system-on-chip, page 15NoC network-on-chip, page 15SDM spatial division multiplexing, page 19SoC system-on-chip, page 15TDM time division multiplexing, page 17VC virtual channel, page 18VLSI very large scale integration, page 16

8

Page 9: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

Abstract

Wei Song, Doctor of Philosophy, The University of ManchesterSpatial Parallelism in the Routers of Asynchronous On-Chip Networks15th November 2010

State-of-the-art multi-processor system-on-chips use on-chip networks as their com-munication fabric. Although most of current on-chip networks are implemented syn-chronously, asynchronous quasi-delay-insensitive (QDI) on-chip networks have sev-eral advantages over their synchronous counterparts. Timing division multiplexing(TDM) flow control methods have been utilized in asynchronous on-chip networksextensively. The data synchronization required by TDM leads to significant speedpenalty. Compared with using TDM methods, exploring spatial parallelism and ap-plying the spatial division multiplexing (SDM) flow control method achieve betternetwork throughput with less area overhead.

This thesis proposes several techniques to increase spatial parallelism in the routersof asynchronous on-chip networks.

Channel slicing is a pipeline structure that alleviates the speed penalty by remov-ing the synchronization among bit-level data pipelines. It is also possible to furtherimprove speed using the lookahead pipeline style if the QDI timing assumption is re-laxed.

SDM is a flow control method that improves network throughput without intro-ducing synchronization among buffers of different frames, which is required by TMDmethods on the contrary. It is also found that the area overhead of SDM is smallerthan the virtual channel (VC) flow control method – the most used TDM method. Themajor design problem of SDM is the area consuming crossbars. A novel 2-stage Closswitch structure is proposed to replace the crossbar in SDM routers, which reduces thearea overhead significantly. This Clos switch is dynamically reconfigured by a newasynchronous dispatching algorithm.

An asynchronous SDM router is implemented using these new techniques. Anasynchronous router using VC is also reproduced for comparison. Performance anal-yses show that the SDM router outperforms the VC router in throughput, area tothroughput efficiency and power to throughput efficiency.

9

Page 10: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

Declaration

No portion of the work referred to in this thesis has beensubmitted in support of an application for another degree orqualification of this or any other university or other instituteof learning.

10

Page 11: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

Copyright

i. The author of this thesis (including any appendices and/or schedules to this the-sis) owns certain copyright or related rights in it (the “Copyright”) and s/he hasgiven The University of Manchester certain rights to use such Copyright, includ-ing for administrative purposes.

ii. Copies of this thesis, either in full or in extracts and whether in hard or electroniccopy, may be made only in accordance with the Copyright, Designs and PatentsAct 1988 (as amended) and regulations issued under it or, where appropriate,in accordance with licensing agreements which the University has from time totime. This page must form part of any such copies made.

iii. The ownership of certain Copyright, patents, designs, trade marks and other in-tellectual property (the “Intellectual Property”) and any reproductions of copy-right works in the thesis, for example graphs and tables (“Reproductions”), whichmay be described in this thesis, may not be owned by the author and may beowned by third parties. Such Intellectual Property and Reproductions cannotand must not be made available for use without the prior written permission ofthe owner(s) of the relevant Intellectual Property and/or Reproductions.

iv. Further information on the conditions under which disclosure, publication andcommercialisation of this thesis, the Copyright and any Intellectual Propertyand/or Reproductions described in it may take place is available in the Uni-versity IP Policy (see http://www.campus.manchester.ac.uk/medialibrary/

policies/intellectual-property.pdf), in any relevant Thesis restriction decla-

rations deposited in the University Library, The University Library’s regulations (see

http://www.manchester.ac.uk/library/aboutus/regulations) and in The Uni-

versity’s policy on presentation of Theses.

11

Page 12: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

Acknowledgements

I would thanks....

12

Page 13: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

The Author

Wei Song received his B.S.EE. from the College of Electronic Information and ControlEngineering at the Beijing University of Technology, Beijing, P.R.China in 2005. Inthe same year, he was admitted through recommendation by the same college to pursuehis M.S.EE. and obtained it in 2008.

From 2004 to 2006, Wei Song was also a research assistant in the Beijing Em-bedded System Key Lab (BESKL). He participated in the design of demodulatorsfor several wireless communication systems including WLAN 802.11a/g, DVB-T andATSC. He also implemented the FPGA verification platforms for most digital designsin BESKL. After leaving BESKL, he went back to his studying college and designeda real-time non-preemptive thread scheduler for a central communication controller inthe hybrid electric vehicle control system. The communication controller was laterpatented in 2008.

Wei Song was offered a full scholarship by the EPSRC doctorate training programand began his Ph.D. study in the School of Computer Science at the University ofManchester in 2007 when he was also writing up his master dissertation. His work inManchester is designing the asynchronous routers in an energy efficient network-on-chip for dynamically reconfigurable computing platforms, supported by EPSRC.

13

Page 14: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

Part I

Introduction and Background

14

Page 15: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

Chapter 1

Introduction

1.1 Motivation

The continuously shrinking transistor geometry makes network-on-chip (NoC) [4]the practical communication fabric for state-of-the-art multi-processor system-on-chip(MPSoC) designs. Following Moore’s Law, the capacity and complexity of a chip hasbeen boosted significantly in recent decades. The function of a board level systemin the last decade can be integrated into one chip in modern system-on-chip (SoC)designs. On the other hand, SoCs are no longer built from scratch simply becausethe complexity is beyond control. Fast and reliable integration of numerous reusableintellectual property (IP) blocks becomes crucial to meet the time to market require-ment. As a replacement for traditional hierarchical bus systems and point to pointconnections, the on-chip network infrastructure provides an unified interface for newIP blocks to be easily plugged into a system. A modern MPSoC is a communication-centric system [11] lying on an on-chip network communication fabric.

Most of current NoCs are synchronous networks where network components aredriven by the same or several global clocks. Thanks to the timing assumptions allowedby the global clock and mature electronic design automation (EDA) tools, these syn-chronous NoCs are fast and area efficient. However, there are several design challengesthat synchronous NoCs are difficult to resolve:

• Support for heterogeneous networks. Unlike chip multi-processor (CMP) sys-tems where every network node is a homogeneous processor element, an MP-SoC is a heterogeneous system where network nodes are IP blocks with differentfunctions and hardware structures. These IP blocks are provided and tested with

15

Page 16: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

16 CHAPTER 1. INTRODUCTION

different clock frequencies, area sizes and even working voltages. The differ-ences complicate the network topology, compromise the latency performance ofsynchronous networks and make chips difficult to reach timing closure.

• Low power consumption. It is crucial to reduce the power consumption of anSoC as it determines the maximal standby time of a handset device. The clocktree of synchronous on-chip networks consumes a significant amount of energy[14] and it is getting worse along with the shrinking transistor geometry.

• Tolerance to variation. Process, temperature and voltage variations affect futuresub-micron VLSI designs significantly [12, 13]. According to the internationaltechnology roadmap for semiconductors, the delay uncertainty caused by varia-tions in the sign-off timing closure will reach 32% in 2024 [1]. Traditional statictiming analysis is going to be replaced by statistical timing analysis methods [6]to cope with dropping yield rate and over-conservative timing estimation. Syn-chronous on-chip networks alleviate this effect by considering variations in theirtask mapping procedure [13]. However, it works only on homogeneous networksand the routers are still working in the worst speed estimation.

Instead of using synchronous on-chip networks, using asynchronous on-chip net-works is a promising solution to the above challenges. The communication compo-nents in an asynchronous on-chip network are built by clockless asynchronous circuits.Data are transmitted according to certain handshake protocols which can be insensi-tive to delay [20]. Because of this delay insensitivity, the interface between all IPblocks to the global asynchronous on-chip network is unified by the same synchronousto/from asynchronous interface. The fact that all synchronous blocks are isolated bythe asynchronous network simplifies chip-level timing closure. Also thanks to the de-lay insensitivity, asynchronous on-chip network is naturally tolerant to all variationsas the delay uncertainty caused by these variations cannot affect the function of thosehandshake protocols. Finally, since no clock is needed in asynchronous circuits, theasynchronous on-chip network consume zero dynamic power when no data is in trans-mission.

However, most asynchronous networks [2, 15, 10, 5, 3, 8] are slower than the syn-chronous on-chip networks with similar structures and resources [14]. Although theglobal clock in synchronous circuits is power consuming, it is a speed and area effi-cient approach to synchronize combinational operations. Asynchronous circuits rely

Page 17: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

1.2. RESEARCH OBJECTIVES 17

on handshake protocols to control data transmission. Combinational operations are ex-plicitly detected and guarded to ensure the insensitivity to delay. The circuits used indetecting combinational operations introduce area and speed overhead. Delay insensi-tive asynchronous circuits are intrinsically slow.

On the other hand, due to the lack of EDA support, the state-of-the-art way ofdesigning asynchronous on-chip networks is to mimic the structure of synchronouson-chip networks. As synchronous on-chip network can synchronize data with nospeed penalty, timing division multiplexing (TDM) techniques [7] are extensively uti-lized. Simply reproducing such TDM structures in asynchronous on-chip networksintroduces more completion detection circuits and increases the speed penalty.

Although the intrinsic speed penalty of completion detection is unavoidable as thepromising advantages of asynchronous circuits derived from those delay insensitivehandshake protocols, the scale of synchronization in asynchronous circuits can be con-stricted to small transmission units, such as a low-level data pipeline, and the speedpenalty is alleviated. The following question is how to build asynchronous networkwith such limited synchronization?

The solution presented in this thesis is spatial parallelism. TDM is not a goodway in asynchronous circuits because it brings more synchronization and compromisesspeed. If synchronizations are constricted to small scales such as a single low-level datapipeline, these pipelines are controlled distributedly. In other words, communicationresources are divided spatially into unsynchronized low-level pipelines and the speedpenalty of synchronization is alleviated to the minimum.

1.2 Research objectives

The overall goal of this research is to explore the spatial parallelism in asynchronouson-chip routers. It is expected that 49% of the global signals will be driven by hand-shake protocols by 2024 and the latency of asynchronous signalling will be improvedthrough 2014 [1]. Routers are the key components of an on-chip network. Improvingthe speed of asynchronous routers using spatial division techniques provides a feasibleway of meeting the speed requirement for future chip designs and hopefully the tech-niques can be utilized in general asynchronous circuits besides asynchronous on-chipnetworks.

Spatial parallelism will be explored in different service levels. Although there isno consensus on the definition of layers in on-chip networks, the lower communication

Page 18: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

18 CHAPTER 1. INTRODUCTION

structure can be generally distinguished into three layers: routing layer, switching layer

and physical layer [9]. The data transmitted in a network are divided accordinglyas frame, flit and phit. The physical layer refers to basic communication resourcessuch as buffers and channels that deliver phits from buffer to buffer. A flit comprisesone or several phits. The switching layer dynamically allocates basic communicationresources of the physical layer to different flits. The hardware structure and algorithmused in this allocation process is normally named as the control flow method. A frameis the smallest data unit that is self-explainable to a network node and it contains one orseveral flits. The routing layer determines the route through which a frame is deliveredin the network. As a flit is the data unit operated in routers, this research concentrateson exploring the spatial parallelism in the lowest two layers: the physical layer and theswitching layer.

In the physical layer, the state-of-the-art routers use synchronized multi-bit pipelinesas buffer stages, which are similar as the latches on buses in synchronous circuits. Thispipeline style simplifies the control logic but introduces significant speed overhead.The effect of the speed degradation caused by synchronization will be analysed. Sometechniques will be proposed to alleviate this degradation and will be compared withthe synchronized pipeline style for speed, area and power performance.

In the switching layer, most of current asynchronous on-chip networks use tim-ing division flow control methods such as virtual channel (VC). The new flow controlmethods proposed in this research will be compared with those asynchronous routersusing timing division flow control methods. Their speed performance, area consump-tion and power dissipation will be analysed within different working environments.

1.3 Research contributions

The following contributions have been made upon this research:

• In the physical layer

– Analysis of the speed, area and power overhead of synchronizing low-leveldata pipelines

– Channel slicing, a technique that removes the synchronization among low-level data pipelines

– A method of utilizing the lookahead pipeline style in normal asynchronouspipelines

Page 19: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

1.4. THESIS ORGANIZATION 19

• In the switching layer

– Overhead analysis of the virtual channel (VC) flow control method

– Overhead analysis of the spatial division multiplexing (SDM) flow controlmethod

– Utilization of SDM in asynchronous routers

– Method to reduce the area overhead of SDM using Clos switches

– A new 2-stage Clos switch structure for on-chip routers

– An asynchronous dispatching algorithm and hardware structure to dynam-ically reconfigure Clos switches

• Overall

– A novel asynchronous SDM router

– Performance comparison between SDM and VC

1.4 Thesis organization

The thesis is divided into three parts: Part I provides a brief background introductionof this thesis. Part II proposes several new techniques to increase spatial parallelismin asynchronous routers. Finally an router is implemented in Part III utilizing all thetechniques introduced in Part II.

In the rest of Part I, Chapter 2 presents an overview of asynchronous circuits includ-ing their different delay assumptions, handshake protocols, data encoding methods andbasic building elements. Chapter 3 introduces the concepts related to on-chip networksand reviews previously published asynchronous router designs.

Part II proposes several new techniques in different layers. Chapter 4 concen-trates on the physical layer. Channel slicing is utilized to remove the synchronizationamong low-level data pipelines and the lookahead pipeline style is used to further re-duce the cycle period. Instead of using timing division flow control methods, Chapter 5proposes the spatial division multiplexing (SDM) flow control method and examinesits advantage over the virtual channel (VC) flow control method by behavioural levelsimulations. As the major implementation overhead of SDM is the enlarged crossbar,Chapter 6 provides a solution of reducing the area overhead by replacing the crossbarwith a Clos switch. However, dynamically reconfiguring a multi-stage Clos switch is

Page 20: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

20 CHAPTER 1. INTRODUCTION

complicated and has yet been implemented asynchronously. The first asynchronousClos scheduler is designed and implemented also in Chapter 6.

Part III combines all the techniques in Part II into one router design. Chapter 7briefly describes the final asynchronous SDM router. It is compared with a reproducedVC router in Chapter 8. The thesis is finally concluded in Chapter 9.

1.5 Publications

The following papers have been produced during the research of this work. The chap-ters that are closely related to these papers are identified respectively.

1. Wei Song, Doug Edwards, Zhenyu Liu and Sohini Dasgupta. Routing of asyn-chronous Clos networks. In submission to IET Computers & Digital Techniques,2010.The hardware implementation and the performance evaluation of an asynchronousClos scheduler in Chapter 6 come from this paper.

2. Wei Song and Doug Edwards. Asynchronous spatial division multiplexing router.To be published in Microprocessors and Microsystems, 2010, DOI: 10.1016/j.micpro.2010.08.007 [18].The SDM router implementation in Chapter 5 originated from this paper.

3. Wei Song and Doug Edwards. Improving the throughput of asynchronous on-chip networks with SDM. In Proc. of UK Electronics Forum, pages 47 – 56,June 2010.

4. Wei Song and Doug Edwards. An asynchronous routing algorithm for Clos net-works. In Proc. of International Conference on Application of Concurrency to

System Design, pages 67-76, June 2010 [17].The asynchronous dispatching algorithm in Chapter 6 was first published in thispaper.

5. Wei Song and Doug Edwards. A low latency wormhole router for asynchronouson-chip networks. In Proc. of Asia and South Pacific Design Automation Con-

ference, pages 437443, January 2010 [19].The area and speed performance of using channel slicing and lookahead pipelinesin Chapter 4 was published in this paper.

Page 21: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

1.5. PUBLICATIONS 21

6. Wei Song and Doug Edwards. Channel Slicing: a way to build fast routers forasynchronous NoCs. In Proc. of UK Asynchronous Forum, September 2009.

7. Wei Song and Doug Edwards. Building asynchronous routers with independentsub-channels. In Proc. of international Symposium on System-on-Chip, pages48-51, October 2009 [16].The channel slicing technique introduced in Chapter 4 was first proposed in thispaper.

8. Wei Song, Doug Edwards, Jose Nunez-Yanez, and Sohini Dasgupta. Adaptivestochastic routing in fault-tolerant on-chip networks. In Proc. of ACM/IEEE

International Symposium on Networks-on-Chip, pages 32-37, May 2009.

9. Wei Song and Doug Edwards. A dynamic link allocation router. In Proc. of UK

Asynchronous Forum, September 2008.

Page 22: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

Chapter 2

Asynchronous Circuits

2.1 Synchronous and asynchronous circuits

Explain the difference between synchronous and asynchronous circuits. Describe theadvantages of using asynchronous circuits: low dynamic power and tolerance to vari-ation.

2.2 Delay assumptions

DI, SI, QDI and self-timed.

2.3 Handshake protocols

4-pahse and 2-phase protocols.

2.4 Data encoding

single-rail (bundled-data), dual rail, 1-of-4 (CHAIN) and n-of-m.

2.5 Basic elements

C-element, C2n, C2p, MUTEX

22

Page 23: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

2.6. SUMMARY 23

2.6 Summary

Without explicite notification, all the circuits in this work use 4-phase 1-of-4 QDI data-paths and QDI control logics.

Page 24: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

Chapter 3

Network-on-Chip

3.1 Architecture of on-chip networks

Explain the basic concepts such as: on-chip network, NoC, topology, frame, flit, phit,processor element, network interface/adapter, router, link/channel, routing algorithm,and flow control methods.

3.2 Globally asynchronous locally synchronous net-

works

Introduce the idea of GALS. Classify different GALS system. Discribe the advan-tages and disadvantages of different GALS systems. Clearify the GALS network I amstudying in this work.

3.3 Previous GALS NoCs

List and review the previous GALS NoC designs.

3.3.1 Chain

The Chain network used in SpiNNaker CMP chips.

3.3.2 ASPIN

The asynchronous wormhole router designed by LIP6, Fr.

24

Page 25: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

3.4. SUMMARY 25

3.3.3 QoS NoC

The VC router design in APT, UoM.

3.3.4 MANGO

The VC router designed by Sparso, DUT, Denmark.

3.3.5 ANoC

The QDI VC router designed by CEA-Leti, Fr.

3.3.6 QNoC

The VC router designed by Technion, Israel.

3.4 Summary

Page 26: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

Part II

Levels of Parallelism

26

Page 27: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

Chapter 4

Parallelism in the Physical Layer

Before the first section, I need to explain this chapter is trying to introduce more par-allelism in the basic data-paths.

4.1 Synchronization overhead

explain the speed overhead of synchronized pipelines.

4.2 Channel slicing

Introduce the channel slicing technique inside wormhole routers.

4.3 Lookahead pipeline style

Introduce more parallelism by relaxing the QDI delay assumption.

4.4 A channel sliced wormhole router

Utilizing channel slicing and lookahead in a wormhole router.

27

Page 28: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

28 CHAPTER 4. PARALLELISM IN THE PHYSICAL LAYER

4.4.1 Router structure

4.4.2 Performance

4.5 Summary

Page 29: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

Chapter 5

Parallelism in the Switching Layer

before the first section, I need to point out that I am trying to explore the parallelism inflow control methods.

5.1 Problems of timing division and virtual channel

The speed overhead and area overhead intoduced by TDM and VC.

5.2 Spatial division multiplexing

Introduce the basic idea of SDM.

5.3 An SDM router

Demonstrate the SDM router design.

5.3.1 Router structure

5.3.2 Performance

5.4 Summary

29

Page 30: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

Chapter 6

Area Reduction using Clos

6.1 Motivation

Explain the reasons why I need to replace the internal crossbar with a Clos network.

6.2 Clos switching networks

An introduction of the Clos switching networks and the area comparison with cross-bars.

30

Page 31: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

6.3. DYNAMICALLY RECONFIGURATION 31

6.3 Dynamically reconfiguration

6.3.1 Dispatching algorithms

6.3.2 Concurrent round-robin dispatching algorithm

6.3.3 Asynchronous dispatching algorithm

6.4 Asynchronous Clos scheduler

6.4.1 Implementation

6.4.2 Performance

6.5 2-stage Clos swtich

6.6 Summary

Page 32: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

Part III

Performance Evaluation andConclusion

32

Page 33: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

Chapter 7

Asynchronous SDM Router

7.1 Router structure

7.2 Utilizing the 2-stage Clos switch

7.3 Implementation

7.4 Summary

33

Page 34: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

Chapter 8

Performance Evaluation

8.1 Reproduction of QoS NoC

8.2 Single router performance

8.2.1 Area

8.2.2 Speed

8.2.3 Power

8.3 Network performance

8.4 Summary

34

Page 35: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

Chapter 9

Conclusions and Future Work

9.1 Summary of the thesis

9.1.1 Channel slicing

9.1.2 SDM

9.1.3 Clos

9.2 Future work

35

Page 36: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

Bibliography

[1] International Technology Roadmap for Semiconductors, chapter Design,pages 12–13. 2009. URL: http://www.itrs.net/Links/2009ITRS/

2009Chapters_2009Tables/2009_Design.pdf [Online; accessed11/11/2010].

[2] J. Bainbridge and S. Furber. Chain: a delay-insensitive chip area interconnect.IEEE Micro, 22:16–23, 2002.

[3] E. Beigne, F. Clermidy, P. Vivet, A. Clouard, and M. Renaudin. An asynchronousNOC architecture providing low latency service and its multi-level design frame-work. In Proc. of International Symposium on Asynchronous Circuits and Sys-tems, pages 54–63, March 2005.

[4] L. Benini and G. D. Micheli. Networks on chips: a new SoC paradigm. IEEEComputer, 35(1):70–78, 2002.

[5] T. Bjerregaard and J. Sparsø. A router architecture for connection-oriented ser-vice guarantees in the MANGO clockless network-on-chip. In Proc. of Design,Automation and Test in Europe, pages 1226–1231, 2005.

[6] D. Blaauw, K. Chopra, A. Srivastava, and L. Scheffer. Statistical timing analysis:from basic principles to state of the art. IEEE Transactions on Computer-AidedDesign of Integrated Circuits and Systems, 27(4):589 –607, April 2008.

[7] W. J. Dally. Virtual-channel flow control. IEEE Transactions on Parallel andDistributed Systems, 3(2):194–205, March 1992.

[8] R. R. Dobkin, R. Ginosar, and A. Kolodny. QNoC asynchronous router. Integra-tion, the VLSI Journal, 42(2):103–115, March 2009.

[9] J. Duato, S. Yalamanchili, and L. Ni. Interconnection networks: an engineeringapproach. Morgan Kaufmann Publishers, 2003.

[10] T. Felicijan. Quality-of-service (QoS) for asynchronous on-chip networks. PhDthesis, the Faculty of Science and Engineering, the University of Manchester,2004. URL: http://intranet.cs.man.ac.uk/apt/publications/thesis/felicijan04_phd.php [Online; accessed 11/11/2010].

36

Page 37: Spatial Parallelism in the Routers of Asynchronous On-Chip ...apt.cs.manchester.ac.uk/people/wsong/paper/thesis.pdf · Spatial Parallelism in the Routers of Asynchronous On-Chip Networks

BIBLIOGRAPHY 37

[11] J. Henkel, W. Wolf, and S. Chakradhar. On-chip networks: a scalable,communication-centric embedded system design paradigm. In Proc. of Inter-national Conference on VLSI Design, pages 845 – 851, 2004.

[12] B. Li, L.-S. Peh, and P. Patra. Impact of process and temperature variationson network-on-chip design exploration. In Proc. of ACM/IEEE InternationalSymposium on Networks-on-Chip, pages 117–126, April 2008.

[13] S. Majzoub, R. Saleh, and R. Ward. PVT variation impact on voltage islandformation in MPSoC design. In Proc. International Symposium on Quality ofElectronic Design, pages 814–819, March 2009.

[14] I. Miro-Panades, F. Clermidy, P. Vivet, and A. Greiner. Physical implementationof the DSPIN network-on-chip in the FAUST architecture. In Proc. of ACM/IEEEInternational Symposium on Networks-on-Chip, pages 139–148, April 2008.

[15] A. Sheibanyrad. Asynchronous Implementation of a Distributed Network-on-Chip. PhD thesis, University of Pierre et Marie Curie, 2008. URL: ftp://asim.lip6.fr/pub/reports/2008/th.lip6.2008.sheibanyrad.1.pdf [Online;accessed 11/11/2010].

[16] W. Song and D. Edwards. Building asynchronous routers with independent sub-channels. In Proc. of International Symposium on System-on-Chip, pages 48–51,October 2009.

[17] W. Song and D. Edwards. An asynchronous routing algorithm for Clos networks.In Proc. of International Conference on Application of Concurrency to SystemDesign, pages 67–76, 2010.

[18] W. Song and D. Edwards. Asynchronous spatial division multiplexingrouter. Microprocessors and Microsystems, In Press:–, 2010. DOI:10.1016/j.micpro.2010.08.007.

[19] W. Song and D. Edwards. A low latency wormhole router for asynchronous on-chip networks. In Proc. of Asia and South Pacific Design Automation Conference,pages 437–443, 2010.

[20] J. Sparsø and S. Furber. Principles of Asynchronous Circuit Design — A SystemsPerspective. Kluwer Academic Publishers, Boston, U.S.A, 2001.