Click here to load reader
View
213
Download
0
Embed Size (px)
1 NoCArc09 Ring Router Microarchitecture
Router Microarchitecture and Scalability of Ring Topology in
On-Chip Networks
John Kim, Hanjoon Kim Department of Computer Science
KAIST
2 NoCArc09 Ring Router Microarchitecture
Topology
Topology efficiently exploits the available packaging technology to meet the requirements at a minimum cost
zero-load latency
saturation throughput
3 NoCArc09
[Scott et al. ISCA06]
On-chip networks are different
Ring Router Microarchitecture
Off-Chip Networks On-Chip Networks
[src: Intel Developers Forum]
4 NoCArc09
Topologies for On-Chip Networks
Crossbar is often sufficient if it can be done efficiently
2D mesh topology commonly assumed Many different topologies recently proposed
CMESH [ICS06] Flattened butterfly [Micro07] Express Cubes [HPCA09] Hierarchical Network [HPCA09]
Recent multicore architectures have used the ring topology Cell processor, Intel processors,
Ring Router Microarchitecture
5 NoCArc09
Why Ring Topology?
Routing route with clockwise or counterclockwise route until destination reached
Low-radix router each router only requires 3 ports (local port, left & right
port) Flow control
Arbitration can be simplified 3 ports but only two maximum requests
Can be implemented without routers Bufferless router Simple topology
Ring Router Microarchitecture
6 NoCArc09 Ring Router Microarchitecture
Todays Talk
Background in On-Chip Networks and Topology
Router Microarchitecture for Ring Topology
Scalability of Ring Topology
Summary
7 NoCArc09
Bufferless router in ring topology
Simplified arbitration Priority to packets already in flight Guaranteed (deterministic) latency to destination
No buffers needed No misrouting [Bufferless router ISCA09] No packet dropping [SCARAB Micro09]
Only two-input muxes No routing deadlock
Ring Router Microarchitecture
8 NoCArc09
Conventional Router Microarchitecture
Ring Router Microarchitecture
9 NoCArc09
Bufferless Ring Topology Router Microarchitecture
Ring Router Microarchitecture
10 NoCArc09
No buffers needed
Ring Router Microarchitecture
11 NoCArc09
Bufferless router in ring topology
Simplified arbitration Priority to packets already in flight Guaranteed (deterministic) latency to destination
No buffers needed No misrouting [Bufferless router ISCA09] No packet dropping [SCARAB Micro09]
Only two-input muxes No routing deadlock
However Requires reserving the path to destination Can reduce performance/throughput
Ring Router Microarchitecture
12 NoCArc09
Lightweight Router Microarchitecture
Add a buffer entry (2 buffer entry per input port) Credit-based flow control for backpressure Maintain same prioritized arbitration for packets in flight Arbitration needed when ejecting packets
Ring Router Microarchitecture
bufferless lightweight
13 NoCArc09
Lightweight Router Microarchitecture
No predetermined routing Bufferless : only in the appropriate slot was packet injected
into the network Lightweight : the packet can be injected at any time
Deadlock Packets in the bufferless router were guaranteed to make
progress Routing deadlock still avoided without additional virtual
channels ( see paper for detail )
Ring Router Microarchitecture
14 NoCArc09
Evaluation
Cycle accurate simulator used to compared ring router microarchitecture
Simulator parameters include N = 16 single-flit packet (1 flit = 512 bits) synthetic traffic patterns
Orion2.0 used to model area / power (results in paper)
Following microarchitectures compared: baseline (3 cycle) bufferless (1 cycle) lightweight (1 cycle)
Ring Router Microarchitecture
15 NoCArc09
Performance Comparison
Ring Router Microarchitecture
0
5
10
15
20
25
30
0 0.2 0.4 0.6 0.8
Late
ncy
(cyc
les)
Offered load (fraction of capacity)
bufferless
lightweight
baseline (b=2)
baseline (b=8)
0
5
10
15
20
25
30
0 0.2 0.4 0.6 0.8
Late
ncy
(cyc
les)
Offered load (fraction of capacity)
bufferless
lightweight
baseline (b=2)
baseline (b=8)
uniform random bit complement
16 NoCArc09
Impact of Prioritized Arbitration
Ring Router Microarchitecture
0
5
10
15
20
25
30
0 0.2 0.4 0.6 0.8
Late
ncy
(cyc
les)
Offered load (fraction of capacity)
baseline (b=1)
baseline (b=2)
lightweight
17 NoCArc09 Ring Router Microarchitecture
Todays Talk
Background in On-Chip Networks and Topology
Router Microarchitecture for Ring Topology
Scalability of Ring Topology
Summary
18 NoCArc09
How Scalable is the Ring Topology?
Assumption : same bisection bandwidth comparing ring and 2D mesh The bandwidth PER channel for ring is higher than 2D mesh Trade-off of hop count vs serialization latency Per-hop latency can be higher with 2D mesh
Ring Router Microarchitecture
19 NoCArc09
0
0.5
1
1.5
2
2.5
16 36 64 16 36 64 16 36 64 16 36 64
2 4 8 16
Nor
mal
ized
runt
ime
ring
mesh
Synthetic Workload
Ring Router Microarchitecture
network size (N) max oustanding req (r)
20 NoCArc09
Bandwidth Fragmentation
2D mesh : short packets (req) = 1 flit long packets (reply) = 4 flits
ring : short packets (req) = 1 flit long packets (reply) = 1 flit
Wide channels results in high bandwidth for ring However, for short packets, ring only utilizes of the
channel bandwidth Ring topology inefficient for short packets
Ring Router Microarchitecture
21 NoCArc09
0
0.5
1
1.5
2
2.5
163664163664163664163664
2 4 8 16
Nor
mal
ized
runt
ime
0
0.5
1
1.5
2
2.5
16 36 64 16 36 64 16 36 64 16 36 64
2 4 8 16
Nor
mal
ized
runt
ime
ring
mesh
Bandwidth Fragmentation
Ring Router Microarchitecture
bimodal pkts single flits pkts
22 NoCArc09
Limitations of this study
Packaging of on-chip network topology = 2D layout of the topology
Layout of topology can impact the performance 2D mesh : only require communicating with neighbors Ring : long links can be needed as network scale
Hierarchical rings not investigated.
Router complexity (for mesh) not properly modeled.
Ring Router Microarchitecture
23 NoCArc09 Ring Router Microarchitecture
Summary
On-chip networks presents different constraints compared to off-chip networks can exploit different router microarchitecture.
Ring topology presents a simple topology and bufferless router microarchitecture can be implemented.
Lightweight router microarchitecture proposed to increase performance with minimal additional complexity.
Ring topology can scale but because of bandwidth fragmentation, can be limited in scalability especially high traffic.
Can we scale this router microarchitecture to 2D mesh topology?
24 NoCArc09
Low-Cost Router Microarchitecture (Micro09)
Ring Router Microarchitecture
25 NoCArc09 Ring Router Microarchitecture
Thank you
Questions?