Transcript
Page 1: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

1 NoCArc’09 Ring Router Microarchitecture

Router Microarchitecture and Scalability of Ring Topology in

On-Chip Networks

John Kim, Hanjoon Kim Department of Computer Science

KAIST

Page 2: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

2 NoCArc’09 Ring Router Microarchitecture

Topology

•  Topology efficiently exploits the available packaging technology to meet the requirements at a minimum cost

zero-load latency

saturation throughput

Page 3: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

3 NoCArc’09

[Scott et al. ISCA06]

On-chip networks are different

Ring Router Microarchitecture

Off-Chip Networks On-Chip Networks

[src: Intel Developers Forum]

Page 4: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

4 NoCArc’09

Topologies for On-Chip Networks

•  Crossbar is often sufficient – if it can be done efficiently

•  2D mesh topology commonly assumed •  Many different topologies recently proposed

–  CMESH [ICS’06] –  Flattened butterfly [Micro’07] –  Express Cubes [HPCA’09] –  Hierarchical Network [HPCA’09] –  …

•  Recent multicore architectures have used the ring topology –  Cell processor, Intel processors, …

Ring Router Microarchitecture

Page 5: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

5 NoCArc’09

Why Ring Topology?

•  Routing –  route with clockwise or counterclockwise –  route until destination reached

•  Low-radix router –  each “router” only requires 3 ports (local port, left & right

port) •  Flow control

–  Arbitration can be simplified –  3 ports but only two maximum requests

•  Can be implemented without “routers” –  Bufferless router –  Simple topology

Ring Router Microarchitecture

Page 6: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

6 NoCArc’09 Ring Router Microarchitecture

Today’s Talk

•  Background in On-Chip Networks and Topology

•  Router Microarchitecture for Ring Topology

•  Scalability of Ring Topology

•  Summary

Page 7: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

7 NoCArc’09

Bufferless router in ring topology

•  Simplified arbitration –  Priority to packets already in flight –  Guaranteed (deterministic) latency to destination

•  No buffers needed –  No misrouting [Bufferless router ISCA’09] –  No packet dropping [SCARAB Micro’09]

•  Only two-input muxes •  No routing deadlock

Ring Router Microarchitecture

Page 8: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

8 NoCArc’09

Conventional Router Microarchitecture

Ring Router Microarchitecture

Page 9: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

9 NoCArc’09

Bufferless Ring Topology Router Microarchitecture

Ring Router Microarchitecture

Page 10: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

10 NoCArc’09

No buffers needed

Ring Router Microarchitecture

Page 11: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

11 NoCArc’09

Bufferless router in ring topology

•  Simplified arbitration –  Priority to packets already in flight –  Guaranteed (deterministic) latency to destination

•  No buffers needed –  No misrouting [Bufferless router ISCA’09] –  No packet dropping [SCARAB Micro’09]

•  Only two-input muxes •  No routing deadlock

•  However… –  Requires reserving the path to destination –  Can reduce performance/throughput

Ring Router Microarchitecture

Page 12: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

12 NoCArc’09

Lightweight Router Microarchitecture

•  Add a buffer entry (2 buffer entry per input port) •  Credit-based flow control for backpressure •  Maintain same prioritized arbitration for packets in flight •  Arbitration needed when ejecting packets

Ring Router Microarchitecture

bufferless lightweight

Page 13: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

13 NoCArc’09

Lightweight Router Microarchitecture

•  No predetermined routing –  Bufferless : only in the appropriate slot was packet injected

into the network –  Lightweight : the packet can be injected at any time

•  Deadlock –  Packets in the bufferless router were guaranteed to make

progress –  Routing deadlock still avoided without additional virtual

channels ( see paper for detail )

Ring Router Microarchitecture

Page 14: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

14 NoCArc’09

Evaluation

•  Cycle accurate simulator used to compared ring router microarchitecture

•  Simulator parameters include –  N = 16 –  single-flit packet (1 flit = 512 bits) –  synthetic traffic patterns

•  Orion2.0 used to model area / power (results in paper)

•  Following microarchitectures compared: –  baseline (3 cycle) –  bufferless (1 cycle) –  lightweight (1 cycle)

Ring Router Microarchitecture

Page 15: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

15 NoCArc’09

Performance Comparison

Ring Router Microarchitecture

0

5

10

15

20

25

30

0 0.2 0.4 0.6 0.8

Late

ncy

(cyc

les)

Offered load (fraction of capacity)

bufferless

lightweight

baseline (b=2)

baseline (b=8)

0

5

10

15

20

25

30

0 0.2 0.4 0.6 0.8

Late

ncy

(cyc

les)

Offered load (fraction of capacity)

bufferless

lightweight

baseline (b=2)

baseline (b=8)

uniform random bit complement

Page 16: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

16 NoCArc’09

Impact of Prioritized Arbitration

Ring Router Microarchitecture

0

5

10

15

20

25

30

0 0.2 0.4 0.6 0.8

Late

ncy

(cyc

les)

Offered load (fraction of capacity)

baseline (b=1)

baseline (b=2)

lightweight

Page 17: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

17 NoCArc’09 Ring Router Microarchitecture

Today’s Talk

•  Background in On-Chip Networks and Topology

•  Router Microarchitecture for Ring Topology

•  Scalability of Ring Topology

•  Summary

Page 18: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

18 NoCArc’09

How Scalable is the Ring Topology?

•  Assumption : same bisection bandwidth comparing ring and 2D mesh  The bandwidth PER channel for ring is higher than 2D mesh  Trade-off of hop count vs serialization latency  Per-hop latency can be higher with 2D mesh

Ring Router Microarchitecture

Page 19: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

19 NoCArc’09

0

0.5

1

1.5

2

2.5

16 36 64 16 36 64 16 36 64 16 36 64

2 4 8 16

Nor

mal

ized

runt

ime

ring

mesh

Synthetic Workload

Ring Router Microarchitecture

network size (N) max oustanding req (r)

Page 20: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

20 NoCArc’09

Bandwidth Fragmentation

•  2D mesh : –  short packets (req) = 1 flit –  long packets (reply) = 4 flits

•  ring : –  short packets (req) = 1 flit –  long packets (reply) = 1 flit

 Wide channels results in high bandwidth for ring  However, for short packets, ring only utilizes ¼ of the

channel bandwidth  Ring topology inefficient for short packets

Ring Router Microarchitecture

Page 21: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

21 NoCArc’09

0

0.5

1

1.5

2

2.5

16 36 64 16 36 64 16 36 64 16 36 64

2 4 8 16

Nor

mal

ized

runt

ime

0

0.5

1

1.5

2

2.5

16 36 64 16 36 64 16 36 64 16 36 64

2 4 8 16

Nor

mal

ized

runt

ime

ring

mesh

Bandwidth Fragmentation

Ring Router Microarchitecture

bimodal pkts single flits pkts

Page 22: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

22 NoCArc’09

Limitations of this study

•  “Packaging” of on-chip network topology = 2D layout of the topology

•  Layout of topology can impact the performance –  2D mesh : only require communicating with neighbors –  Ring : long links can be needed as network scale

•  Hierarchical rings not investigated.

•  Router complexity (for mesh) not properly modeled.

Ring Router Microarchitecture

Page 23: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

23 NoCArc’09 Ring Router Microarchitecture

Summary

•  On-chip networks presents different constraints compared to off-chip networks – can exploit different router microarchitecture.

•  Ring topology presents a simple topology and bufferless router microarchitecture can be implemented.

•  Lightweight router microarchitecture proposed to increase performance with minimal additional complexity.

•  Ring topology can scale but because of bandwidth fragmentation, can be limited in scalability – especially high traffic.

•  Can we scale this router microarchitecture to 2D mesh topology?

Page 24: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

24 NoCArc’09

Low-Cost Router Microarchitecture (Micro’09)

Ring Router Microarchitecture

Page 25: Router Microarchitecture and Scalability of Ring Topology ... · NoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip Networks

25 NoCArc’09 Ring Router Microarchitecture

Thank you

Questions?


Recommended