135
Nt k Chi EAIT, 2011 Networ k-on-Chip The Next Generation of The Next Generation of Multi-Processor System-on-Chip Presenters Dr. Santanu Chattopadhyay Associate Professor Dept. of Electronics and Electrical Communication Engineering Santanu Kundu Research Scholar Indian Institute of Technology, Kharagpur. email: {santanu, skundu}@ece.iitkgp.ernet.in 18 th Feb, 2011

[Tutorial] NoC the Next Generation of Multi-Processor SoC

Embed Size (px)

DESCRIPTION

NoC the Next Generation of Multi-Processor SoC

Citation preview

Page 1: [Tutorial] NoC the Next Generation of Multi-Processor SoC

N t k ChiEAIT, 2011

Network-on-ChipThe Next Generation ofThe Next Generation of

Multi-Processor System-on-Chip Presenters

Dr. Santanu ChattopadhyayAssociate Professor

Dept. of Electronics and Electrical Communication Engineering

Santanu KunduResearch Scholar

p g gIndian Institute of Technology, Kharagpur.

email: santanu, [email protected] Feb, 2011

Page 2: [Tutorial] NoC the Next Generation of Multi-Processor SoC

2

Lecture – 1Lecture 1

IntroductionIntroduction

Page 3: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

After mass market production ofI t d ti

3

After mass market production ofdual-core and quad-coreprocessor chips, the trendtowards Multi Core processing is

Introduction

towards Multi-Core processing isnow a well established one.

In multi-core processing,

End NodeEnd NodeEnd NodeEnd Node

…SW Interface SW Interface SW Interface SW Interface

Device Device Device Device

In multi core processing,multiple processor (i.e. CPU,DSP) along with multiplecomputer components (i.e.

Lin

k

Lin

k

Lin

k

Lin

k…HW Interface HW Interface HW Interface HW Interface

computer components (i.e.microcontroller, memory blocks,timers, etc.) are integrated ontoa single silicon chip This

Communication Medium

a single silicon chip. Thisarchitecture is often called asMulti-Processor System-on-Chip(MPSoC)

Architecture overview of (MPSoC).

Multi-Processor System-on-Chip

Page 4: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Introduction

4

Each on chip component referredt I t ll t l P t (IP)

System-on-Chip (SoC)Introduction

to as Intellectual Property (IP)block.

The communication medium usedThe communication medium usedin modern multi-processor chips isbus based.

Upto tens of cores in a single chip,the performance of these bus basedchips are satisfactory. But beyondthat its performance degrade withnumber of cores attached.

The communication backbone used in modern SoC is shared bus.

Page 5: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Limitation of Shared Global Bus

5

• Communication Bottleneck: A shared bus allows only onecommunication at a time and even in a hierarchical bus a

Limitation of Shared Global Bus

communication at a time, and even in a hierarchical bus, asingle communication can block all buses of the hierarchy.

• Scalability: Bus based SoC does not scale with the system sizeScalability: Bus based SoC does not scale with the system sizeand its bandwidth is shared by all the systems attached to it.

Node Node

XNode

X

Page 6: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Limitation of Shared Global Bus

6

• The intrinsic parasitic resistanced it b it hi h

Limitation of Shared Global Bus

and capacitance can be quite highfor a long bus line.

• The global bus delay increasesexponentially with decrease inprocess technology.

• E er additional IP block adds to• Every additional IP block adds toparasitic capacitance and causesincreased propagation delay.

• In deep sub-micron era, 80% ormore of the delay of critical pathswill be due to globalinterconnects.Relative Evolution of wire and gate delays

Reference: International Technology Roadmap for Semiconductor (ITRS) Documents (2003), Available at: http://public.itrs.net/Files/2003ITRS/Home2003.htm.

Page 7: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Shared Global Bus to Segmented Bus

7

Shared Global Bus to Segmented Bus

R

R

R

R

• Shared global bus is segmented by inserting repeaters (R).

Segmented Bus Multi-Level Segmented Bus

• In segmented bus, delay increases linearly with decrease in processtechnology .

• No improvement in bandwidth as it is still shared by all the coresp yattached to it.

• At the system level, it has a profound effect in changing the focusfrom computation to communicationfrom computation to communication.

Page 8: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Point to Point Dedicated Links

8

Advantage:

Point-to-Point Dedicated Links

• Bandwidth is higher than the sharedbus.

Drawback:7

01

Drawback:

• Switch size increases with increasein number of cores.

6 2

• Number of links needed increasesexponentially as the number ofcores increases.

45 3

• More number of metal layers arerequired in placement and routing.

Page 9: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Centralized Crossbar Switch

9

Centralized Crossbar Switch

Node Node

Components:

• Crossbar switch and

Advantage:

Node Node

Crossbar Switch

• Crossbar switch and

• Point-to-point links.Advantage:

• A crossbar switch enhance thescalability to some extent.

Node Node

Drawback:

• However, connecting largenumber of cores with a singlegswitch is not very effective asit is not ultimately scalableand, thus, it is an, ,intermediate solution.

Page 10: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Network-on-Chip: A Paradigm Shift

10

Network-on-Chip: A Paradigm ShiftOff-Chip vs. On-Chip Networks

Th b d id h f ff hi k io The bandwidth of off-chip networks is typically much lower than on-chip networks.

o Off-chip network is often affected by clock skew whereas clock skew problem is less significant for on-chip networks.

Only 3 components…

g p

o Off-chip networks has higher latency than their on-chip counter part.

1. Network Interface (NI)

2. Switch (Router)

3 Point-to-Point Links

o Area is not a strong constraint for off-chip networks, but for on-chip network it is one of the major constraint3. Point-to-Point Links

Reference: Benini, L. and Micheli, G.D. (2002) ‘Network on chips: a new SOC paradigm’, IEEE Computer, Vol. 35, No. 1, pp.70–78.

of the major constraint.

Page 11: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Layers of Abstraction in Network-on-Chip

11

Session Layer- NoC Abstraction

Layers of Abstraction in Network-on-Chip

(Open Core Protocol Standardization)

Transport LayerTransport Layer- Network Interface

Network Layer- Router / Switch

Data Link Layer- Flow Control ProtocolFlow Control Protocol - Error Handling

Physical Layer- Physical Wire Connection

Page 12: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

SoC to NoC: An Evolution

12

SoC to NoC: An Evolution

SoC NoC

SoC • Bandwidth is

limited, shared• Aggregate bandwidth

grows

• Speed goes down as N grows

• Central arbitration

• Speed unaffected by N

• Distributed arbitration

oC

Central arbitration

• No layers of abstraction

Distributed arbitration

• Separate abstraction layers

N However:

• Fairly simple.

However:

• Complex architecture.

Page 13: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Design Goal of Network-on-Chip

13

Design Goal of Network-on-Chip

High throughput

Low latency

S l bl hiScalable architecture

Less energy consumption

Smaller area requirements

R li bili i C i iReliability in Communication.

Quality-of-Service Support

Page 14: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Lecture – 2

Architecture Design and Performance Evaluation of

Network-on-Chip

Page 15: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Design Issues in Network-on-Chip

15

Design Issues in Network-on-Chip

• Topology Selection

• Switching Techniques

• Routing

• Flow Control Protocol &• Flow Control Protocol & GALS Implementation

• Buffering• Buffering

• Arbitration

Page 16: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

S it hi T h i

16

Switching Techniques

Ci it S it hiBuffers

for “request”tokens

• Circuit Switching

Source Destination

Request for circuit establishment(routing and arbitration is performed during this step)

end nodeDestination

end node

(routing and arbitration is performed during this step)

Page 17: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Ci it S it hi

17

Circuit Switching

Buffers for “ack” tokens

Source end node

Destination

Request for circuit establishmentend node

end node

Acknowledgment and circuit establishment(as token travels back to the source connections are established)(as token travels back to the source, connections are established)

Page 18: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur18

Ci it S it hiCircuit Switching

Request for circuit establishment

Source end node

Destination end node

Acknowledgment and circuit establishment

Message transport(neither routing nor arbitration is required)( g q )

Page 19: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur19

Ci it S it hiCircuit Switching

X

Source end node

Destination end node

Acknowledgment and circuit establishment

Packet transport

High contention, low utilization low throughput

Page 20: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Switching Techniques

20

• Store-and-forward Packet switching

Switching Techniques

Buffers for data

Store-and-forward Packet switching

Packets are completely stored before any portion is forwarded packets

Store Drawback:

1. Larger Buffer

2 M L

Source end node

Destination end node

2. More Latency

end node end node

Page 21: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Switching Techniques

21

Switching Techniques

• Store-and-forward Packet switching

Requirement:buffers must be

Store-and-forward Packet switching

Packets are completely stored before any portion is forwarded

sized to holdentire packet

Latency per router depends on the size of the packet

StoreForward Drawback:

1. Larger Buffer,

2 M r L t n

Source end node

Destination end node

2. More Latency

end node

Page 22: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Switching Techniques

22

Switching Techniques

• Virtual Cut-Through Packet SwitchingRequirement:

buffers must be sized to hold entire packet

Packets completely stored at the switch

Drawback:

L B ffBusy Larger Buffer

Advantage:

Lesser LatencySource

BusyLink

Destinationy

Source end node

Destination end node

Latency/ router reduced by forwarding header flit of a packet as soon as space for y/ y g p pthe entire packet in the next router.

Page 23: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Switching Techniques

23

Switching Techniques

• Wormhole Packet Switching

R i

Advantage: Lower Buffer Space, Lesser Latency.

Dra back: Thro ghp t lesser than Virt al C t Thro gh Requirement:packets can be

largerthan buffers

Drawback: Throughput lesser than Virtual Cut Through

BusyLink

Source Destination

Link

end node end nodePackets stored along the switch

Page 24: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Network Interface (NI) Module

24

Network Interface (NI) Module

Protocol Conversion

Clock Domain Shifting

Page 25: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Network Interface (NI) Module

25

Network Interface (NI) Modulepacket (64x32)bits

Fli i iFlitization

eop bop Src_add Dest _addHeader(32-bit) GT/BE

Payload 1(32-bit)

DATA 1GT/BEeop bop

Payload 2 DATA2GT/BEeop bop

(32-bit)

y(32-bit)

DATA2GT/BEeop bop

Tailer DATA nGT/BEeop bop

Deflitization

packet (64x32)bits

(32 bit)

• 1 Packet = 64 Flits

• 1 Flit = 32 bits p ( )1 Flit 32 bits

Page 26: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Design Issues in Network-on-Chip

26

Design Issues in Network-on-Chip

• Switching TechniquesSwitching Techniques

• Topology Selection • Topology Selection

• Routing

• Flow Control Protocol & GALS Implementation

• Buffering

• Arbitration

Page 27: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Topology Selection

27

N b f Li kTopology Selection• Diameter

Maximum shortest path distance between two nodes in

• Number of LinksA topology with large number

of links can support highbandwidthp

the network. Networks with small diameters arepreferable.

• Average Distance

bandwidth.

Average Distance is the average among the distancesbetween all pairs of nodes of a graph. A topologyhaving lesser average distance is preferable.

• Bisection Width• Bisection WidthMinimum number of wires removed in order to bisect

a network. A larger bisection width enables fasterinformation exchange, and preferable.

• Topology selection is2D Mesh with 16 cores

• Node DegreeNumbers of channels connecting the node to its

neighbors. The lower this number, the easier to build p gyapplication dependent.

g ,the network.

Reference: Interconnection Network Architectures (2001) pp.26–49, Available at: www.wellesley.edu/cs/ courses/cs331/notes/notesnetworks.pdf

Page 28: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Existing Topologies in NoC

28

Existing Topologies in NoC

All switches are connected to the fourclosest other switches and target

2D Mesh

core

s

closest other switches and targetresource block via two opposite uni-directional links, except thoseswitches on the edge of the

mes

hof

16

c switches on the edge of thelayout.

For M×N Mesh,Di t (M + N 2)

2D m Diameter: (M + N - 2)

Bisection Width: min (M, N)No. of routers required: (M * N)Node Degree: 3 (corner)Node Degree: 3 (corner),

4 (edge), 5 (central).CLICHÉ: Chip-Level Integration of Communicating Heterogeneous Elements g g

Reference: Kumar, S., Jantsch, A., Soininen, J. P., Forsell, M., Millberg, M., Oberg, J., Tiensyrja, K. andHemani, A. (2002) ‘A network on chip architecture and design methodology’, Proc. of. ISVLSI, pp.117–124.

Page 29: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Existing Topologies in NoC

29

Existing Topologies in NoC

Wires are wrapped around from2D Torus

core

s

ppthe top component to thebottom and rightmost toleftmost

Tor

usof

16

c leftmost.

For M×N Torus,

2D T Diameter: M/2 + N/2

Bisection Width: 2 * min (M, N)

No of routers required: (M * N)No. of routers required: (M * N)

Node Degree: 5

Disadvantage: The long end-around connections can yield excessive delays

Reference: Dally, W. J. and Towles, B. (2001) ‘Route packets, not wires: on-chip interconnectionnetworks’, Proceedings of the 38th Design Automation Conference (DAC 2001), pp.684–689.

Disadvantage: The long end-around connections can yield excessive delays.

Page 30: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Existing Topologies in NoC

30

Existing Topologies in NoC Solving Delay Problem of Torus

Reducing theimaximum

physicallink length

Page 31: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Existing Topologies in NoC

31

Existing Topologies in NoC Folded Torus

ores

16 c

ores

orus

of 1

6 co

ed T

orus

of 1

2DTo

2D F

olde

d

Reference: Dally, W.J. and Seitz, C.L. (1986) ‘The torus routing chip’, Journal of DistributedComputing, Vol. 1, No. 4, pp.187–196.

Page 32: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Existing Topologies in NoC

32

Existing Topologies in NoC Octagon For a network having N number of IP

bl k

s

Diameter: 2 * N/8 .blocks,

D b k

on o

f 8

core

s

For a system consisting of more thaneight nodes, the network is

Drawback:

2DO

ctag

o eight nodes, the network isextended to multidimensionalspace.

Wiring complexity increases linearlywith number of nodes.

Reference: Karim, F., Nguyen, A. and Dey, S. (2002) ‘An interconnect architecture for networkingsystems on chips’, IEEE Micro, Vol. 22, No. 5, pp.36–45.

Page 33: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Existing Topologies in NoC

33

Existing Topologies in NoC Binary Tree

A binary tree-based network with N

of 1

6 co

res (power of 2) number of IP core has,

Diameter: log2 N

inar

y T

ree

o

Bisection Width: 1

No of Routers required: (N/2 1)

2DB No. of Routers required: (N/2 − 1)

Node Degree: 5 (leaf), 3 (stem), 2 (root)

Dr b k Bi ti n Width i r lDrawback: Bisection Width is very less.

Advantage: Lesser Diameter.

Reference: Jeang, Y. L., Huang, W. H. and Fang, W. F. (2004) ‘A binary tree architecture for application specificnetwork on chip (ASNOC) design’, IEEE Asia-Pacific Conference on Circuits and Systems, pp.877–880.

Page 34: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Existing Topologies in NoC

34

Existing Topologies in NoC Fat Tree Every level has same number switches. The

functional IP blocks reside at the leaves and the

of 1

6 co

res functional IP blocks reside at the leaves and the

switches reside at the vertices.

For N number of IP blocks, the network has,

2DFa

t Tre

e o For N number of IP blocks, the network has,Diameter: log2 N/4

Bisection Width: N/2

2

SPIN: Scalable, Programmable, Integrated Network

No. of Routers required: (N. log2 N)/8

Node Degree: 8 (non-root node), 4 (root node).

Advantage: Large Bisection Width, Smaller Diameter

Drawback : High Node Degree

Reference: Guerrier, P. and Greiner, A. (2000) ‘A generic architecture for on-chip packet-switchedinterconnections’, Proceedings of Design, Automation and Test in Europe (DATE 2000), pp.250–256.

Drawback : High Node Degree

Page 35: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Existing Topologies in NoC

35

Existing Topologies in NoC Butterfly Fat Tree (BFT) In the network, the IPs are placed at the

l d i h l d h

f 16

cor

es

leaves and switches placed at thevertices. For N number of IPs, thenetwork has,

2DBF

To

Diameter: log2 N/4

Bisection Width: √NAdvantage- Requires lesser number of switches

Low diameter and Large bisection

Bisection Width: √N

No. of Routers needed: (≈ N/2)

- Low diameter and Large bisection width

Drawback- High node-degree.

Node Degree: 6 (non-root), 4 (root)

Reference: Pande, P. P., Grecu, C., Ivanov, A. and Saleh, R. (2003), ‘High-throughput switch-based interconnectfor future SoCs’, Proc. Int’l Workshop on System-on-Chip for Real Time Applications, pp.304–310.

High node degree.

Page 36: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Mesh-of-Tree Topology

36

Mesh-of-Tree Topology- In M × N MoT where M

denotes the number ofR T d NRow Trees and Ndenotes the number ofColumn Trees. Both Mand N are power of 2and N are power of 2.

- Number of nodes

= 3*M*N – (M + N).

- Small Diameter

(2 log2 M + 2 log2 N).

- Large Bisection Width

4 × 4 M h f T ti 32

g

[min (M,N)].

Drawback

Non planer topology

Reference: Kundu, S. and Chattopadhyay, S. (2008), “Mesh-of-Tree Deterministic Routing for Network-on-Chip Architecture”, ACM Great Lake Symposium on VLSI, pp. 343–346.

4 × 4 Mesh-of-Tree connecting 32 cores - Non-planer topology.

Page 37: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Design Issues in Network-on-Chip

37

Design Issues in Network-on-Chip

• Switching TechniquesSwitching Techniques

• Topology Selection

• Routing

• Flow Control Protocol &

• Routing

GALS Implementation

• Buffering

• Arbitration

Page 38: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Routing

38

RoutingSource Routing vs. Distributed RoutingSource routing

Routing control unit in switches is simplified; computed at source.

Headers containing the route tend to be larger increase overhead.Distributed routing

Next route computed by finite-state machine or by look-up table.

Deterministic Routing vs. Adaptive RoutingDeterministic routingDeterministic routing

Always follow a specified path.

Easy to implement and supports in-order delivery.Ad i iAdaptive routing

Different paths based on congestion and faults; destroys in-order delivery.

Historical channel load information, length of queues, status of nodesand links.

Page 39: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Routing Challenges

39

Routing Challenges• Livelock

• Arises from an unboundednumber of allowed non-

Live-lock in Adaptive Routing

minimal hops.

• Solution: restrict thenumber of non-minimalhops allowed

D

hops allowed.

• Deadlock• Arises from a set of

packets being blockedpackets being blockedwaiting only for networkresources (i.e., links,buffers) held by otherpackets in the set.

• Probability increases withincreased traffic &d r d il bilitdecreased availability.

Page 40: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Routing Dependent Deadlock

40

Routing Dependent Deadlockp p1

0

1ci = channel i si = source node id d i i d i k i

s1 s2c3

c1 c2

4

p1p2

c0

c3

p2

di = destination node i pi = packet i

d3c1 c2

c4

c5c11

d4

c0

s5

c12

c4 c5

c7 8

p2p3

c3

c6

p3

c12

p5

d1d2c7c8

c10

c5c11

c6

d5

c12 c7 c8

c10 c11

p3p4

p4

c6

c9

p4

c12

s3s4c9

c10 c11 p4c9

Routing of packets in a 2D mesh Channel dependency graph

Page 41: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Routing Dependent Deadlock Avoidance

41

Routing Dependent Deadlock AvoidanceDeterministic Routing in 2D mesh using Dimension Ordered Routing

E t bli h d i all b d t k di i E lEstablish ordering on all resources based on network dimension. Example:X-Y Routing: First, route horizontally and match the Y co-ordinate; and then routevertically and match X co-ordinate.

X Y R tin N l in th Ch nn l D p nd n Gr phX-Y Routing No cycle in the Channel Dependency Graph

Page 42: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Routing Dependent Deadlock Avoidance

42

Routing Dependent Deadlock AvoidanceDeadlock Free Adaptive Routing in 2D Mesh: Turn Model

West First

North LastAdaptive Routingp gDeterministic

Routing

Negative First

Reference: Glass, C. J. and Ni, L. M. (1992), ‘Turn Model for Adaptive Routing’, Proceedings ofInternational Symposium on Computer Architecture, pp. 278 – 287.

Page 43: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Routing Dependent Deadlock Avoidance

43

Routing Dependent Deadlock AvoidanceDeadlock Free Adaptive Routing in 2D Mesh: Odd-Even Turn Model

Rule 1. Any packet is not allowed tok EN d EStake an EN turn and ES turn at any

nodes located in an even column.

Rule 2. Any packet is not allowed totake an NW turn and SW turn at anytake an NW turn and SW turn at anynodes located in an odd column.

Reference: Chiu, G. M. (2000), ‘The Odd-Even Turn Model for Adaptive Routing’, IEEE Transactions on Parallel and Distributed Systems, pp. 729 – 738.

Page 44: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Routing Dependent Deadlock Avoidance

44

Routing Dependent Deadlock AvoidanceDeterministic Routing in 2D Torus and Folded Torus by using Virtual Channels

Messages at a node numbered less than their destinationMessages at a node numbered less than their destinationnode are routed on the high channels, and messages at anode numbered greater than their destination node arerouted on the low channels.

n0 n1 n2 n3n0 n1 n2 n3n0 → n2 n1 → n3 n2 → n0 n3 → n1

Reference: Dally, W. J. and Seitz, C. L., (1987) ‘Deadlock Free Message Routing in Multiprocessor Interconnection Networks’, IEEE Transactions on Computers, vol. C-36, no. 5, pp. 547 – 553.

Page 45: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Deadlock Recovery

45

Allow deadlock to occur, but once a potential deadlocksituation is detected, break at least one of the cyclic

Deadlock Recovery

situation is detected, break at least one of the cyclicdependencies to gracefully recover. The common techniquesare,

Regressive recovery (abort-and-retry): Remove packet(s)from a dependency cycle by killing (aborting) and later re-injecting (retry) the packet(s) into the network after somej g ( y) p ( )delay.

Progressive recovery (preemptive): Remove packet(s) fromd d l b i h k ( )a dependency cycle by rerouting the packet(s) onto a

deadlock-free lane.

Page 46: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Design Issues in Network-on-Chip

46

Design Issues in Network-on-Chip

• Switching TechniquesSwitching Techniques

• Topology Selection

• Routing

• Flow Control Protocol & • Flow Control Protocol & GALS Implementation

• Buffering

GALS Implementation

• Arbitration

Page 47: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Flow Control Protocol

47

Flow Control Protocol

Page 48: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Flow Control Protocol

48

Flow Control Protocol

Page 49: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Fl C l P l

49

Flow Control Protocol

Page 50: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Globally Asynchronous Locally Synchronous

50

Globally Asynchronous Locally Synchronous (GALS) style of Communication

Reference: Kundu, S. and Chattopadhyay, S. (2007) ‘Interfacing Cores and Routers in Network-on-Chip Using GALS’, IEEE International Symposium on Integrated Circuits (ISIC 2007), pp.

Page 51: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Design Issues in Network-on-Chip

51

Design Issues in Network-on-Chip

• Switching TechniquesSwitching Techniques

• Topology Selection

• Routing

• Flow Control Protocol & GALS Implementation

• Buffering• Buffering

• Arbitration

Page 52: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Counter based FIFO

52

Counter based FIFO• Binary Counter Based- Drawback

1. There can be considerable ambiguity when a count is read during count transition.

• Gray Code Counter Based- Drawback

1. Power of 2 FIFO depth. Area wastage for non- binary FIFO depth.

Reference: Yi, Cheng, “Gray code sequences”, U. S. Patent 6703950, March 9, 2004.

Page 53: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Gray Counter Based Dual Clock FIFO

53

Gray Counter Based Dual Clock FIFO

Reference: Cummings, C. E. and Alfke, P. (2002) ‘Simulation and Synthesis Techniques for Asynchronous FIFODesign with Asynchronous Pointer Comparisons’, Synopsys Users Group Conference, vol. User Papers.

Page 54: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Functionality of Asynchronous Comparator

54

Functionality of Asynchronous Comparator

Full = ( (waddr == raddr) && (wr_dir != rd_dir) )

Empty = ( (waddr == raddr) && (wr_dir == rd_dir) )

Page 55: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Metastability

55

Metastability

• Full and Empty Signals are controlled by both thecontrolled by both the clocks. Thus probability of arising Metastable States.

• 2-State Synchronizer are used to reduce the probability of Metastability.

• Full Signal is synchronized with the ‘wr-clk’ and Empty Si l i h i d i hSignal is synchronized with the ‘rd-clk’.

Full = ( (waddr == raddr) && (wr_dir != rd_dir) )

E ( ( dd dd ) && ( di d di ) )Empty = ( (waddr == raddr) && (wr_dir == rd_dir) )

Page 56: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Design Issues in Network-on-Chip

56

Design Issues in Network-on-Chip

• Switching TechniquesSwitching Techniques

• Topology Selection

• Routing

• Flow Control Protocol & GALS Implementation

• Buffering

• Arbitration• Arbitration

Page 57: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Arbitration

57

Arbitration

Page 58: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Router Architecture

58

Router Architecture

Input Channel• Input Buffer• Routing Computation Unit

Output Channel• Output Buffer• Arbiter

• Control Unit • Control Unit

Reference: Kundu, S. and Chattopadhyay, S. (2008) ‘Network-on-chip architecture design based on Mesh-of-Treedeterministic routing topology’, Int’l Journal of High Performance Systems Architecture, Vol. 1, No. 3, pp. 163-182.

Page 59: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Wormhole Router Architecture Data Path

59

Wormhole Router Architecture Data Path

b ffk rolPhy

sica

lch

ann

el

Output buffer nk trol P

hysi

cal

chan

nel

sBar (ST)

Input buffer(IB)Li

nkCo

ntr p

(OB) Lin

Con

Routing Control Unit(RC)

HeaderFlit

Cross

Input buffer(IB)

Phy

sica

lch

ann

el

Link

Control

Output buffer(OB) Li

nkCo

ntrol Phy

sica

lch

ann

el

Routing Algorithm

Routing Control Unit(RC)

HeaderFli

Crossbar Control

ArbitrationUnit (SA)

(IB)C (OB) C

CRITICALFlit

Routing Algorithm

ControlOutput Port #

IB (Input Buffering) RC (Route Computation) SA (Switch Alloc) ST (Switch Trav) OB (Output Buffering)

PATH

( p g) ( p ) ( ) ( ) ( p g)

Page 60: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Flit Traversal Through Wormhole Router

60

Flit Traversal Through Wormhole Router

T)

Input buffer(IB)Li

nkon

trolPhysical

chan

nel

Output buffer(OB) Li

nkon

trol Ph

ysical

chan

nel

CrossBar (ST(IB)

Input buffer(IB)

Physical

chan

nel

LCo

Link

ontrol

( ) LCo

Output buffer(OB) nk nt

rol Physical

chan

nel

Routing Control Unit(RC)

Routing Algorithm

HeaderFlit

Routing Control Unit(RC)

HeaderFlit

Routing Algorithm

Crossbar Control

ArbitrationUnit (SA)

Output Port #

(IB)LCo (OB) Lin

Con

g

IB (Input Buffering) RC (Route Computation) SA (Switch Alloc) ST (Switch Trav) OB (Output Buffering)

IB RC SA ST OBPacket Header

IB

IB

IB

IB

IB ST

IB IB ST

IB IB ST

OB

OB

OB

Packet Payload 1

Packet Payload 2

Packet Payload 3 S OPacket Payload 3

Page 61: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Performance Evaluation

61

Performance Metrics

Throughput: Unit: flits/ cycle/ IPlength)(Packet x Packets) Accepted (Maximum

=TP

Performance Evaluation

g p

Latency: The time (in clock cycles) that elapses from between the occurrence of amessage header injection into the network at the source node and the occurrence ofa tail flit reception at the destination node

/ y /time)(Totalx blocks)IPof(Number

TP

a tail flit reception at the destination node.P = total number of messages,

Li= latency of each message i.

Bandwidth: Bandwidth refers to the maximum number of bits can send successfully to

Lavg = P

LiP

∑1

Bandwidth: Bandwidth refers to the maximum number of bits can send successfully tothe destination through the network per second. It is represented as bps (bits/sec).

d

Cost Metrics Energy dissipation: Energy consumed by routers and links at different workload. Average energy/packet and average energy/clock cycle are being measured.

Area requirements: Percentage chip area occupied by the switch and links havetaken into consideration.

Page 62: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Simulator Design for Performance Evaluation

62

Simulator Design for Performance EvaluationTypes of Simulator

1. Cycle Accurate:Sample the state of the signals at every clock edge (positive or negative).

Much faster than event driven simulation.

2. Event Driven:2. Event iven:Most accurate as every active signal is calculated for everydevice during the clock cycle as it propagates.

Each signal is simulated for its value and its time of occurrence.g

Excellent for timing analysis and verify race conditions.

Computation intensive (depends on the number of activities) andhence very slowhence very slow.

To calculate the performance metrics like throughput, latency etc., the delayafter each and every gate is not required. In that case Cycle Accurate Simulatoris the best choice.

Page 63: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Existing NoC Simulators

63

Existing NoC Simulators

Some Existing NoC Simulators

Drawbacks

NIRGAM li i d M h lNIRGAMUniversity of Southampton,

UK

limited to Mesh topology;No power evaluation

MPARM - XpipesUniversity of Bologna, Italy

Not freely available

NS2 Packet level transactionNS2Open Source

Packet level transaction

Page 64: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Cycle Accurate Simulator for NoC Modeling

64

Cycle Accurate Simulator for NoC ModelingThe simulator should operate at the granularity of individual architectural

components of the router.co po e s o e ou e .

SystemC is normally preferred.

Traffic Generators are used for evaluating the performance of NoC.

Input Channel• Input Buffer• Routing Computation Unit

Output Channel• Output Buffer• Arbiter• Routing Computation Unit

• Control Unit• Arbiter• Control Unit

Router

1. Throughput2. Latency3. Bandwidth

Network

Traffic Generation

• Poisson Distribution• Self-Similar Traffic•Appli ti n Sp ifi Tr ffi•Application Specific Traffic

Page 65: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Traffic Generator

65

Traffic GeneratorApplication Driven Traffic is the best suited for performance evaluation.

D t il bilit f th th ti t ffi d l l dDue to unavailability of the same, synthetic traffic source models are also used.

Nature of traffic is generally bursty in NoC.• A Poisson process

When observed on a fine time scale will appear burstyBurst length of a Poisson arrivalBurst length of a Poisson arrivalprocess tends to be smoothedby averaging over long enoughtime scale.P i f ilPoisson process fail to capturethe actual burstiness of NoCtraffic .

Short range DependenceShort range Dependence

Reference: Varatkar, G.V. and Marculescu, R. (2004) ‘On-chip traffic modeling and synthesis for MPEG-2 videoapplications’, IEEE Trans. on Very Large Scale Integration (VLSI) Systems, Vol. 12, No. 1, pp. 108-119.

Page 66: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Traffic Generator

66

Traffic Generator• A Self-Similar (fractal) process

When aggregated over wide range ofWhen aggregated over wide range oftime scales, will maintain its burstycharacteristic. Self-similarity manifestsitself in several equivalent fashions:

Slowly decaying variance

Long range dependence

Non-degenerate autocorrelations

Heavy Tailed

A Self-Similar process can be generated by super-positioning ON-OFF Pareto Sources

Reference: Park, K. and Willinger, W. (2000) ‘Self-Similar network traffic and performance evaluation’, A Wiley-Interscience Publication, John Wiley & Sons, Inc.

Page 67: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Traffic Parameter

67

d = 6

d = 5

Offered Load: Number of packets injected for particular time interval.

Traffic Parameter

d = 3d = 2d = 1d = 0

d = 4

S

Locality Factor: Ratio of traffic destined to the local clusterfrom a core to the total traffic injected by each core.

Locality Factor = 0 signifies Uniform Distributed Traffic.

For example in 4x4 Mesh, the distances (d) of the destinations from one corner source are at d = 1, 2, 3, 4, 5, and 6. If locality factor = 0.5, then , , , , , y ,

50 percent of the traffic will go to the cluster having d = 1. Rest 50 percent traffic will be distributed as

o 15% will go to the cluster having d = 2o 12.5% will go to the cluster having d = 3o 10% will go to the cluster having d = 4o 7.5% will go to the cluster having d = 5

do 5% will go to the cluster having d = 6

If there is more than one core in a cluster, the traffic will be randomlydistributed among them.

Page 68: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Performance Evaluation

68

TheoreticallyPerformance Evaluation

T l

Performance of any network depends on the following network parameters. Distance Average

Links ofNumber Throughputα

Theoretically,

Topology

Locality factor of the traffic

Buffer Position and Buffer Depth

S i hi T h i

Latency α Average Distance

Switching Techniques

Number of cores attached

Here, Wormhole router architecture is used to Mes

h

, Wform the network with following parameters,

Number of cores attached = 32

Message Length = Packet Length = 64 flits

MMessage Length = Packet Length = 64 flits

Each flit consists of 32 bits

Total Simulation cycle = 2 lacs with

BF

T

10,000 cycle settling time

Page 69: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Performance Evaluation

69

Throughput varies with topology and locality factor

Performance Evaluation

Throughput = Maximum Accepted Traffic in flits/cycle/IP

We kept buffer depth = 6 in both input and output channels of the router in all the cases

Page 70: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Performance Evaluation

70

Performance EvaluationLatency decreases with increase in Locality Factor in different topologies

We kept buffer depth = 6 in both input and output channels of the router in all the cases

Page 71: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Power Evaluation Flow

71

Power Evaluation FlowRouter Power Evaluation

Reference: Synopsys prime power , Design vision manual.(Version Y-2006.06)

Operating Condition: Process = 1, Voltage = 1 volt, Temp = 75 C0

Page 72: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Power Evaluation Flow

72

Power Evaluation FlowLink Length Estimation

MeshMesh

Estimated Length of Wires:Length of Wires:

1.25 mm,

2.5 mm

Page 73: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Power Evaluation Flow

73

Power Evaluation Flow

fl T ( T)

Link Length Estimation

Butterfly Fat Tree (BFT)

EstimatedEstimated Length of Wires:

1.25 mm,,

5.0 mm

Page 74: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Power Evaluation Flow

74

Interconnect ModelingCopper wire (resistivity = 17 nΩ-m) of Metal Layer 4 (Semi-global) has been taken.

Power Evaluation Flow

To reduce the wiring area we have chosen the minimum dimension of Metal Layer 4. The dimensions are,

Width (W) = 0.2 µmLayer 5

Spacing (S) = 0.2 µmPitch = W + S = 0.4 µmThickness (T) = 0.5 µm

Layer 4( )

H = 0.75 µmDielectric Constant = 2.9 Layer 3

C i f iCross-section of interconnectsLink Energy Evaluation

Parasitic Components (R, C, L) of Three Wire Model has been extracted from FieldSolver tool of HSPICE. The energy consumption of middle wire for different transitionsgy pis also obtained from HSPICE.

Page 75: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Power Evaluation Flow

75

Three wire modeling

Power Evaluation Flow

Data rate : 32 × 200 M bits/sec

Driver sizes are designed based

on length of the wire.

Load Capacitance on the other

end of the wire is 5fF

Look Up Table (LUT) is made

for middle line energy

consumption

Page 76: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur76

Energy Consumption in Mesh TopologyEnergy Consumption in Mesh TopologyNetwork Energy = Router Energy + Link Energy

Si l i f 2 l l k l i h l k i d f 5Simulation runs for 2 lacs clock cycle with clock period of 5 ns

Internal Power D i tDominates

We kept buffer depth = 6 in both input and output channels of the router in all the cases

Page 77: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Comparison of Energy Consumption

77

Comparison of Energy Consumption

We kept buffer depth = 6 in both input and output channels of the router in all the cases

Page 78: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur78

Energy – Performance Trade-OffThroughput Variation with FIFO Depth & Position in Mesh

Energy Performance Trade Off

FIFO_Depth_4-4 => Input Channel FIFO Depth =4, Output Channel FIFO Depth = 4FIFO_Depth_4-6 => Input Channel FIFO Depth =4, Output Channel FIFO Depth = 6FIFO_Depth_6-6 => Input Channel FIFO Depth =6, Output Channel FIFO Depth = 6FIFO_Depth_4-0 => Input Channel FIFO Depth =4, No FIFO at Output ChannelFIFO_Depth_6-0 => Input Channel FIFO Depth =6, No FIFO at Output Channel

Page 79: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Energy – Performance Trade-OffEnergy Performance Trade OffLatency Variation with FIFO Depth & Position in Mesh

FIFO_Depth_4-4 => Input Channel FIFO Depth =4, Output Channel FIFO Depth = 4FIFO_Depth_4-6 => Input Channel FIFO Depth =4, Output Channel FIFO Depth = 6FIFO_Depth_6-6 => Input Channel FIFO Depth =6, Output Channel FIFO Depth = 6FIFO Depth 4-0 => Input Channel FIFO Depth =4, No FIFO at Output ChannelFIFO_Depth_4 0 > Input Channel FIFO Depth 4, No FIFO at Output ChannelFIFO_Depth_6-0 => Input Channel FIFO Depth =6, No FIFO at Output Channel

Page 80: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur80

Energy – Performance Trade-Off

Simulation runs for 2 lacs clock cycle with clock period of 5 ns

Energy Performance Trade OffEnergy Variation with FIFO Depth & Position in Mesh

FIFO_Depth_4-4 => Input Channel FIFO Depth =4, Output Channel FIFO Depth = 4FIFO D h 4 6 > I Ch l FIFO D h 4 O Ch l FIFO D h 6FIFO_Depth_4-6 => Input Channel FIFO Depth =4, Output Channel FIFO Depth = 6FIFO_Depth_6-6 => Input Channel FIFO Depth =6, Output Channel FIFO Depth = 6FIFO_Depth_4-0 => Input Channel FIFO Depth =4, No FIFO at Output ChannelFIFO_Depth_6-0 => Input Channel FIFO Depth =6, No FIFO at Output Channel

Page 81: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur81

Energy – Performance Trade-OffEnergy Performance Trade OffTrade-Off in Mesh at saturation (load = 160)

FIFO D h 6 0 h b E P f T d OffFIFO_Depth_6-0 shows best Energy-Performance Trade-Off

Page 82: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur82

Network Energy Consumption in Mesh afterNetwork Energy Consumption in Mesh after FIFO Optimization

Si l i f 2 l l k l i h l k i d f 5Simulation runs for 2 lacs clock cycle with clock period of 5 ns

Internal Power Still DominatesDominates

We kept FIFO depth = 6 in input channel and no FIFO at output channel

Page 83: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Comparison of Energy Consumption after

83

Comparison of Energy Consumption after FIFO Optimization

We kept FIFO depth = 6 in input channel and no FIFO at output channeland no FIFO at output channel

Page 84: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Internal Power

84

Internal Power

Netlist View of a D-type flip-flop with synchronous clear input in S D i Vi iSynopsys Design Vision

• Internal power = short circuit power + Internal node switching powerInternal power short circuit power Internal node switching power

• Output node of the clock-buffer switches continuously with free running clock

To minimize Internal Power: Stop the clock when the network is idle

Page 85: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Internal Power Minimization

85

Internal Power MinimizationNetlist View of FIFO Memory

Page 86: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur86

Network Energy Consumption in Mesh afterNetwork Energy Consumption in Mesh after Clock Gating in FIFO

Simulation runs for 2 lacs clock cycle with clock period of 5 ns

We kept FIFO depth = 6 in input channel and no FIFO at output channel

Page 87: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Comparison of Energy Consumption after

87

Comparison of Energy Consumption after Clock Gating in FIFO

We kept FIFO depth = 6 in input channel and no FIFO at output channeland no FIFO at output channel

Page 88: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Network Area Comparison

88

Network Area Comparison

% SoC Area Overhead

BFT Mesh

2 424 3 701

Total Core Area = (32 * 2.5 * 2.5) sq. mm. = 200 sq. mm.

2.424 3.701

Page 89: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Scalability Measurement

89

Scalability MeasurementScalability is a property which exhibits performance proportional to the

number of cores employed.

As the size of a scalable system is increased, a corresponding increase inperformance is obtained.

BW = [(Throughput * Number of cores attached * Number of bits in a flit) / clock period]

Page 90: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Head-of-Line Blocking in Wormhole Router

90

Head of Line Blocking in Wormhole Router

VC0

XX

X

2D mesh, no VCs, XY routing

Page 91: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Introduction of Virtual Channels

91

Introduction of Virtual Channels• Multiple Virtual Channels multiplexed on a single physical link to improve

performance.

• Payload flits use the VC acquired by the header flit while tailer flit releases VC.

VC 0 VC 0

Physical

Switch A Switch B

VC 1

MUX VC 1D

EMUX

ydata link

VC control

VC Scheduler

VC control

Reference: Dally, W. J. (1992) ‘Virtual Channel Flow Control’, IEEE Trans. on Parallel and Distributed Systems, Vol. 3, No. 2, pp. 194–205.

Page 92: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Virtual Channels

92

Virtual ChannelsVC0

VC1

X

2D mesh, 2 VCs, XY routing

VC avoids HOL blocking.

routingg

Page 93: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Virtual Channels

93

VC0

VC1

Virtual Channels

XXX

No VCs

X

No VCs available

VC mitigates HOL blocking but can

li i i2D mesh, 2 VCs, XY ro ting not eliminate itrouting

Page 94: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Virtual Channel Based Router Architecture

94

Virtual Channel Based Router Architecturehysical

hann

el Input buffers

ol ol hysical

hann

el

Ph ch

Link

Contr o

... MUXD

EMUX Li

nkCo

ntr o Ph ch

sBar

Physical

channe

l

Input buffers

nk ntrol

. MUX

DEM

nk ontrol Physical

channe

l

Cross

Lin

Co .. M

MUX Li

nCo

Routing Control and

Arbitration Unit

Page 95: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Virtual Channel Based Router Architecture

95

Input buffers

sical

nnel

ysical

anne

l

Virtual Channel Based Router ArchitectureLink

ControlPhy

cha

... Link

Control Phy

cha

MUX

DEM

UX M

UX

Input buffers

cal

nel

cal

nelCrossBar

Link

ControlPhysi c

chann

...

DEM

UX M

UX

Link

Control Physi

chann

MUX

Routing Control and

Arbitration UnitArbitration Unit

Reference: N. Kavaldjiev, G. J. M. Smit, and P. G. Jansen, “A Virtual Channel Router for On-Chip Networks”, in Proc. of IEEE Int’l SOC Conference. IEEE Computer Society Press, pp. 289–293, 2004.

Page 96: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur96

Determination of Number of Virtual ChannelsDetermination of Number of Virtual Channels

- Upto 4 virtual channels throughput increases, but beyond that it saturates.- Energy dissipation increases with increase in the number of virtual channels.- For Energy-Performance Trade-off, 4 virtual channels with each physical

Reference: Pande, P. P., Grecu, C., Jones, M., Ivanov, A. and Saleh, R. (2005) “Performance evaluation and design trade-offs for MP-SOC interconnect architectures”, IEEE Trans. on Computers, Vol. 54, No. 8, pp.1025–1040.

gy , p ychannel is preferred.

Page 97: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur97

Throughput Improvement in Mesh usingThroughput Improvement in Mesh using Virtual Channel Architecture

N f Vi l Ch l 4No. of Virtual Channel = 4

Page 98: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Latency Improvement in Mesh using

98

Latency Improvement in Mesh using Virtual Channel Architecture

No. of Virtual Channel = 4

Page 99: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Energy Overhead in Mesh using Virtual

99

Energy Overhead in Mesh using Virtual Channel Architecture

No. of Virtual Channel = 4

Page 100: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Performance of Some

100

Performance of Some Other Topologies

No. of Virtual Channel = 4

Reference: Pande, P. P., Grecu, C., Jones, M., Ivanov, A. and Saleh, R. (2005) “Performance evaluation and design trade-offs for MP-SOC interconnect architectures”, IEEE Trans. on Computers, Vol. 54, No. 8, pp.1025–1040.

Page 101: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Network Area Comparison with Virtual

101

Network Area Comparison with Virtual Channel Architecture

% SoC Area Overhead% SoC Area Overhead

Mesh BFT

Without VC With VC Without VC With VC

Total Core Area = (32 * 2.5 * 2.5) sq. mm. = 200 sq. mm.

3.701 6.145 2.424 3.507

Page 102: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Quality of Service (QoS) Support

102

Quality of Service (QoS) Support

• Conceptually, two disjointnetworks– a network with throughput and

latency guarantees (guaranteedy g (gthroughput, GT)

– a network without those guarantees(best-effort, BE)( , )

• Several types of commitment inthe network

bi d– combine guaranteed worst-casebehavior

– with good average resource usage

Architectural Modification for Supporting QoS

Reference: Rijpkema, E., Goossens, K., Radulescu, A., Dielssen, J., Meerbergen, J. V., Wielage, P., and Waterlander, E. (2003) “Trade-offs in the Design of a Router with Both Guaranteed and Best-Effort Services for Networks on Chip”,IEE Proc. Computers and Digital Techniques, Vol. 150, No. 5, pp. 294-302.

Page 103: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Lecture – 3

Application Mappingpp pp g

Page 104: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Task of Application Mapping

104

Task of Application Mapping

Page 105: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

M i P bl F l i C G h

105

Mapping Problem Formulation – Core Graph

• Directed graph G = (V, E)

• Each vertex vi represents acorecore

• Each edge ei,j ε E representscommunication between viand vand vj

• Weight of edge ei,j is commi,j,is the bandwidth requirement

Page 106: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Mapping Problem Formulation –

106

Mapping Problem Formulation –NoC Topology Graph

• A directed graph P = (U,F)

• Each vertex u ε U is a router• Each vertex ui ε U is a router

• Each edge fi,j ε F represents a direct communication betweendirect communication between the vertices

• Weight of edge fi j denoted by g g i,j ybwi,j represents the available bandwidth across the edge

Page 107: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

M F i

107

Map Function

• map: V Umap: V U

• Each edge k of the core graph represents a commodity dk

• Each commodity has a value vl(dk) representing thed d f fbandwidth requirement of the communication from vi to vj

• Bandwidth constraint:

An edge in the topology graph must have enough bandwidthAn edge in the topology graph must have enough bandwidthto accommodate all commodities passing through it

• Minimize communication cost:

Σ l(dk) di ( (dk) d (dk))Σk vl(dk) dist(source(dk), dest(dk))

Page 108: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

M i S l i

108

Mapping Solution

Page 109: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

M i Al i h

109

Mapping Algorithms

• Mapping problem is intractableMapping problem is intractable

• Several approaches are possible: ILP, Heuristics (PMAP, GMAP, PBB, NMAP, BMAP etc.), Meta-search heuristics (GA, PSO, Simulated Annealing)

• Other variants of the problem combining,

T k h d lin– Task scheduling

– Power consumption

– Alternative routing paths etc.Alternative routing paths etc.

Page 110: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

M i i h Mi i P h R i

110

Mapping with Minimum-Path Routing (NMAP)

• Three phases – Initialize, Minimum path computation, Iterative improvement

• Initialize:1. Core with maximum communication demand placed onto the

node with maximum number of neighborsg

2. Select the core that communicates most with the mapped cores

3. Place selected core onto the node that minimizes communication cost with mapped onescommunication cost with mapped ones

Page 111: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

M i i h Mi i P h R i

111

Mapping with Minimum-Path Routing

• Shortest Path:• Shortest Path:

– Minimum path routing

– Commodities are sorted on descending order of flows

– For each commodity, shortest path is identified

As soon as a commodity path is finalized cost of each– As soon as a commodity path is finalized, cost of each edge on the path increased by the value of the commodity

Page 112: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

M i i h Mi i P h R i

112

Mapping with Minimum-Path Routing

• Iterative Improvement:Iterative Improvement:

– Iteratively swap vertices pair-wise to obtain a better mapping

– Traffic splitting:

• Multiple shortest paths may exist

• Formulate a multi-commodity flow problem to satisfy bandwidth requirements for solutions that have lesser communication costs but do not satisfy yall the bandwidth requirements

Page 113: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Bi i l M i Al i h (BMAP)

113

Binomial Mapping Algorithm (BMAP)

• NMAP algorithm is O(N4logN)NMAP algorithm is O(N logN)

• BMAP is a three stage algorithm with complexity O(N2logN)

– Binomial Merging Iteration

– Topology Mapping

– Hardware cost Optimization

Page 114: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

BMAP: Binomial Merging Iteration

114

BMAP: Binomial Merging Iteration

1. Calculate IP Ranking: Rank of IP core i,1. Calculate IP Ranking: Rank of IP core i,

ranking(i) = Σ (requirement(i, j) + requirement(j, i), j = 1 to N

requirement(i, j) is the bandwidth requirement from i to j

2. Merge IP Set: Based on ranking merge two IP-sets at a time: logN time

3. Refreshing IP Set: Ranking is recalculated. Ranking of IP3. Refreshing IP Set: Ranking is recalculated. Ranking of IP Set k generated by merging IP Set i and IP Set j is,

ranking(k) = ranking(i) + ranking(j) – requirement(i,j) –requirement(j i)requirement(j,i)

Page 115: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

M i A E l

115

Merging: An Example

Page 116: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

BMAP T l M i d T ffi

116

BMAP: Topology Mapping and Traffic Surface Creation

• After mapping, a traffic surface is generated

• It shows the traffic load of each router

Mi i l h i i d• Minimal path routing is used

• Based on this surface, hardware can be optimized by selecting p y gproper routers from the library

Page 117: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

BMAP H d C O i i i

117

BMAP: Hardware Cost Optimization

1 Dummy Router Elimination:1. Dummy Router Elimination:– Dummy routers added at start point to have 4n routers

– BMAP puts these routers at boundaries, hence can be eliminated

2. Router Selection:– Sharing single buffer among low bandwidth input channels

– Choice of router is made from library

3 Unfolding:3. Unfolding: – Add additional routers and links for larger bandwidth

requirements

Page 118: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

BMAP H d O i i i A E l

118

BMAP: Hardware Optimization - An Example

Page 119: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Network on Chip Synthesis:

119

Network on Chip Synthesis: SUNMAP + xpipes

Page 120: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

SUNMAP T l M i

120

SUNMAP: Topology Mapping

• Optimizes for area power or delay within designOptimizes for area, power or delay within design constraints

• Uses heuristics to perform mapping

• onto topologies: mesh, torus, hypercube, clos, and butterfly

• B ilt in fl pl nn f p n l i• Built in floor-planner for area, power analysis

• Choice of different routing functions

Page 121: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

SUNMAP T l M i

121

SUNMAP: Topology Mapping

Heuristic approach with several phases:Heuristic approach with several phases:

• Initial mapping using a greedy algorithm (from communication graph)

– Compute optimal routing (using flow formulation)

1. Floorplan solution

2. Check area and bandwidth constraints

3. Compute mapping cost

• Iterative improvement loop (Tabu search)• Iterative improvement loop (Tabu search)

• Allows manual and interactive topology creation

Page 122: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

System configuration

122

System configuration// In this topology: 8 cores, 8 memories, 4x4 torus// ----------------------------- IP cores// name, switch number, clock divider, buffers, type

( 0 i h 0 1 6 i i i )core(core_0, switch_0, 1, 6, initiator);core(mem_8, switch_11, 1, 6, target:0x00);[…]// ----------------------------- switches// name, input ports, output ports, buffers

• Specifies

– NIs (I/Os, clocks, // , p p , p p ,switch(switch_0, 5, 5, 6);switch(switch_1, 5, 5, 6);[…]// ----------------------------- links// name so rce destination

buffers)

– switches (I/Os, buffers)// name, source, destination

link(link0, switch_0, switch_1);link(link1, switch_1, switch_0);[…]// ----------------------------- routes

buffers)

– links

– routes// source, destination, hopsroute(core_0, pm_8, switches:0,1,5,6,7,11);route(core_1, pm_9, switches:1,5,9,8);route(core_2, pm_10, switches:2,6,5,9);route(core 3 pm 11 switches:3 2 6 10);route(core_3, pm_11, switches:3,2,6,10);[…]

Page 123: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

i C il Pl f G i

123

xpipes Compiler: Platform Generation

• Input:Input:

– System configuration: Topology, Routing tables, Parameters(flit width, buffering, …)

– Component Library

• Creates a class template for each type of network p n nt b d p n p n nt nfi ti n (I/Ocomponent based upon component configuration (I/O

ports, buffer sizing)

• Hierarchical instantiation of the platform in SystemCp y

Page 124: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Network-on-Chip Synthesis Tool: xpipes

124

Network-on-Chip Synthesis Tool: xpipes

MPARM Architecture

Reference: Bertozzi, D. and Benini, L. (2004) “xpipes: A Network –on-Chip Architecture for Giga Scale Systems-on-Chips”, IEEE Circuits and Systems Magazine, pp. 18-31.

Page 125: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Lecture – 5

Conclusion and Future of Network-on-Chip

Page 126: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Network on Chip: At a Glance

126

Network-on-Chip: At a GlanceTopics Covered

I f Hi h C i i L

Some More TopicsNeed of Network-on-Chip

NoC Architecture Design

Impact of Higher Communication Layers in NoC Performance

Test and Verification of NoC

Performance Evaluation

Design Trade-Off

Thermal Modeling of NoC

Metrics and Benchmarks for NoC.

Application Mapping on NoC

Signal Integrity and Reliability Issues

Floorplan-aware NoC architecture optimization

Fault Tolerant Architecture in NoCg g y y Fault Tolerant Architecture in NoC

CAD Tools for NoC

Page 127: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Limitation of 2D Network on Chip

127

Limitation of 2D Network-on-ChipThe conventional 2D integrated circuit (IC) has limitedfloor-planning choices, and consequently, it limits thefloor planning choices, and consequently, it limits theperformance enhancements arising out of NoC architectures.

Need for more and more bandwidth but not at the cost ofNeed for more and more bandwidth but not at the cost of increased power consumption.

Reference: Carloni, L. P., Pande P. P., Yuan X. (2009) “Networks-on-Chip in emerging interconnect paradigms: Advantages and Challenges” ACM/IEEE Int’l Symp. On Network s-on-Chip, pp. 93-102.

Page 128: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

NoC Research Groups in Foreign Universities

128

NoC Research Groups in Foreign Universities1. Prof. Luca Benini, University of Bologna, Italy.2. Prof. Giovanni De Micheli, EPFL, Switzerland.3 Prof William J Dally Stanford University USA3. Prof. William J. Dally, Stanford University, USA. 4. Prof. Partha Pratim Pande, Washington State University, USA.5. Prof. Radu Marculescu, Carnegie Mellon University, USA.6. Prof. Bashir M Al-Hashimi, University of Southampton, UK.7. Prof. Chita R. Das, Pennsylvania State University, USA.8. Prof. Niraj K. Jha, Princeton University, USA.9. Prof. Sashi Kumar, Jonkoping University, Sweden.10. Prof. Axel Janstach, Royal Institute of Technology (KTH), Sweden.10. Prof. Axel Janstach, Royal Institute of Technology (KTH), Sweden.11. Prof. Jari Nurmi, Tampere University of Technology, Finland.12. Prof. Andre Ivanov, University of British Columbia, Canada.13. Prof. Resve Saleh, University of British Columbia, Canada.14 P f I l Cid T h i I l I i f T h l I l14. Prof. Israel Cidon, Technion-Israel Institute of Technology, Israel.15. Dr. Davide Bertozzi, University of Bologna, Italy.16. Dr. Srinivasan Murali, EPFL, Switzarland.

dand many more …

Page 129: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

NoC Research in Indian Universities

129

NoC Research in Indian Universities1. Prof. Santanu Chattopadhyay, Indian Institute of Technology, Kharagpur.2. Prof. S. K. Nandy, Indian Institute of Science, Bangalore.y, , g3. Prof. Bharadwaj Amruthur, Indian Institute of Science, Bangalore.4. Prof. M. R. Bhujade, Indian Institute of Technology, Bombay.

J l C f d W k h N CJournals, Conference, and Workshop on NoC

Microprocessor and Microsystems Journal Elsevier (MICPRO)

IEEE/ACM International Symposium on Networks-on-Chip

Microprocessor and Microsystems Journal, Elsevier (MICPRO)

IEEE Int’l Workshop on Network on Chip Architectures (NoCArc)

IEEE/ACM International Symposium on Networks on Chip

IEEE Int l Workshop on Network on Chip Architectures (NoCArc)

Page 130: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

NoC Research in Industries

130

NoC Research in IndustriesTilera Corporation Arteris Inc. Silistix Inc. NXP Semiconductor

IBM Corporation(Cyclops-64/Blue Gene)

130

AetherealAethereal

Page 131: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Network on Chip Books

131

Network-on-Chip Books

Page 132: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Network on Chip Books

132

Network-on-Chip Books

Page 133: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Network on Chip Books

133

Network-on-Chip Books

Page 134: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Bibliography

134

BibliographyFor detailed updated reference, the audience are directed to the following link:http://www.cl.cam.ac.uk/~rdm34/onChipNetBib/onChipNetwork.pdf

Below we are giving some of our contributions in NoC research:[1] S. Kundu and S. Chattopadhyay, “Interfacing Cores and Routers in Network-on-Chip Using GALS”, IEEE

International Symposium on Integrated Circuits (ISIC), 2007.[2] S. Kundu and S. Chattopadhyay, “Mesh-of-Tree Deterministic Routing for Network-on-Chip Architecture”,

ACM Great Lake Symposium on VLSI (GLSVLSI) 2008ACM Great Lake Symposium on VLSI (GLSVLSI), 2008.[3] S. Kundu, R. P. Dasari, K. Manna, and S. Chattopadhyay, “Mesh-of-Tree based scalable Network-on-Chip

Architecture”, IEEE Region 10 Colloquium and International Conference on Industrial and InformationSystems (ICIIS), 2008.

[4] S. Kundu and S. Chattopadhyay, “Mesh-of-Tree based Network-on-Chip Architecture Using Virtual Channelbased Router” IEEE VLSI Design and Test Conference (VDAT), 2008.

[5] S. Kundu and S. Chattopadhyay, “Network-on-chip architecture design based on mesh-of-tree deterministicrouting topology”. International Journal for High Performance Systems Architecture, Vol. 1, No. 3, pp.163–182,Inderscience Publisher, 2008.

[6] S. Kundu, R. P. Dasari, K. Manna, and S. Chattopadhyay, “Performance Evaluation of Mesh-of-Tree Based[6] d , , , d p d y y, dNetwork-on-Chip Using Wormhole Router with Poisson Distributed Traffic”, IEEE VLSI Design and TestConference (VDAT), 2009.

[7] S. Kundu, K. Manna, S. Gupta, K. Kumar, R. Parikh, and S. Chattopadhyay, “A Comparative PerformanceEvaluation Of Network-on-Chip Architectures Under Self-Similar Traffic”, IEEE International Conference onAd i R t T h l i i C i ti d C ti (ARTC ) 2009Advances in Recent Technologies in Communication and Computing (ARTCom), 2009.

Page 135: [Tutorial] NoC the Next Generation of Multi-Processor SoC

Dept. of Electronics & Electrical Communication Engg., IIT Kharagpur

Microprocessor

135

Microprocessor Research Laboratory

Th YThank You