Click here to load reader

Router Microarchitecture and Scalability of Ring Topology ... · PDF fileNoCArc’09 Ring Router Microarchitecture 1 Router Microarchitecture and Scalability of Ring Topology in On-Chip

  • View
    213

  • Download
    0

Embed Size (px)

Text of Router Microarchitecture and Scalability of Ring Topology ... · PDF fileNoCArc’09 Ring...

  • 1 NoCArc09 Ring Router Microarchitecture

    Router Microarchitecture and Scalability of Ring Topology in

    On-Chip Networks

    John Kim, Hanjoon Kim Department of Computer Science

    KAIST

  • 2 NoCArc09 Ring Router Microarchitecture

    Topology

    Topology efficiently exploits the available packaging technology to meet the requirements at a minimum cost

    zero-load latency

    saturation throughput

  • 3 NoCArc09

    [Scott et al. ISCA06]

    On-chip networks are different

    Ring Router Microarchitecture

    Off-Chip Networks On-Chip Networks

    [src: Intel Developers Forum]

  • 4 NoCArc09

    Topologies for On-Chip Networks

    Crossbar is often sufficient if it can be done efficiently

    2D mesh topology commonly assumed Many different topologies recently proposed

    CMESH [ICS06] Flattened butterfly [Micro07] Express Cubes [HPCA09] Hierarchical Network [HPCA09]

    Recent multicore architectures have used the ring topology Cell processor, Intel processors,

    Ring Router Microarchitecture

  • 5 NoCArc09

    Why Ring Topology?

    Routing route with clockwise or counterclockwise route until destination reached

    Low-radix router each router only requires 3 ports (local port, left & right

    port) Flow control

    Arbitration can be simplified 3 ports but only two maximum requests

    Can be implemented without routers Bufferless router Simple topology

    Ring Router Microarchitecture

  • 6 NoCArc09 Ring Router Microarchitecture

    Todays Talk

    Background in On-Chip Networks and Topology

    Router Microarchitecture for Ring Topology

    Scalability of Ring Topology

    Summary

  • 7 NoCArc09

    Bufferless router in ring topology

    Simplified arbitration Priority to packets already in flight Guaranteed (deterministic) latency to destination

    No buffers needed No misrouting [Bufferless router ISCA09] No packet dropping [SCARAB Micro09]

    Only two-input muxes No routing deadlock

    Ring Router Microarchitecture

  • 8 NoCArc09

    Conventional Router Microarchitecture

    Ring Router Microarchitecture

  • 9 NoCArc09

    Bufferless Ring Topology Router Microarchitecture

    Ring Router Microarchitecture

  • 10 NoCArc09

    No buffers needed

    Ring Router Microarchitecture

  • 11 NoCArc09

    Bufferless router in ring topology

    Simplified arbitration Priority to packets already in flight Guaranteed (deterministic) latency to destination

    No buffers needed No misrouting [Bufferless router ISCA09] No packet dropping [SCARAB Micro09]

    Only two-input muxes No routing deadlock

    However Requires reserving the path to destination Can reduce performance/throughput

    Ring Router Microarchitecture

  • 12 NoCArc09

    Lightweight Router Microarchitecture

    Add a buffer entry (2 buffer entry per input port) Credit-based flow control for backpressure Maintain same prioritized arbitration for packets in flight Arbitration needed when ejecting packets

    Ring Router Microarchitecture

    bufferless lightweight

  • 13 NoCArc09

    Lightweight Router Microarchitecture

    No predetermined routing Bufferless : only in the appropriate slot was packet injected

    into the network Lightweight : the packet can be injected at any time

    Deadlock Packets in the bufferless router were guaranteed to make

    progress Routing deadlock still avoided without additional virtual

    channels ( see paper for detail )

    Ring Router Microarchitecture

  • 14 NoCArc09

    Evaluation

    Cycle accurate simulator used to compared ring router microarchitecture

    Simulator parameters include N = 16 single-flit packet (1 flit = 512 bits) synthetic traffic patterns

    Orion2.0 used to model area / power (results in paper)

    Following microarchitectures compared: baseline (3 cycle) bufferless (1 cycle) lightweight (1 cycle)

    Ring Router Microarchitecture

  • 15 NoCArc09

    Performance Comparison

    Ring Router Microarchitecture

    0

    5

    10

    15

    20

    25

    30

    0 0.2 0.4 0.6 0.8

    Late

    ncy

    (cyc

    les)

    Offered load (fraction of capacity)

    bufferless

    lightweight

    baseline (b=2)

    baseline (b=8)

    0

    5

    10

    15

    20

    25

    30

    0 0.2 0.4 0.6 0.8

    Late

    ncy

    (cyc

    les)

    Offered load (fraction of capacity)

    bufferless

    lightweight

    baseline (b=2)

    baseline (b=8)

    uniform random bit complement

  • 16 NoCArc09

    Impact of Prioritized Arbitration

    Ring Router Microarchitecture

    0

    5

    10

    15

    20

    25

    30

    0 0.2 0.4 0.6 0.8

    Late

    ncy

    (cyc

    les)

    Offered load (fraction of capacity)

    baseline (b=1)

    baseline (b=2)

    lightweight

  • 17 NoCArc09 Ring Router Microarchitecture

    Todays Talk

    Background in On-Chip Networks and Topology

    Router Microarchitecture for Ring Topology

    Scalability of Ring Topology

    Summary

  • 18 NoCArc09

    How Scalable is the Ring Topology?

    Assumption : same bisection bandwidth comparing ring and 2D mesh The bandwidth PER channel for ring is higher than 2D mesh Trade-off of hop count vs serialization latency Per-hop latency can be higher with 2D mesh

    Ring Router Microarchitecture

  • 19 NoCArc09

    0

    0.5

    1

    1.5

    2

    2.5

    16 36 64 16 36 64 16 36 64 16 36 64

    2 4 8 16

    Nor

    mal

    ized

    runt

    ime

    ring

    mesh

    Synthetic Workload

    Ring Router Microarchitecture

    network size (N) max oustanding req (r)

  • 20 NoCArc09

    Bandwidth Fragmentation

    2D mesh : short packets (req) = 1 flit long packets (reply) = 4 flits

    ring : short packets (req) = 1 flit long packets (reply) = 1 flit

    Wide channels results in high bandwidth for ring However, for short packets, ring only utilizes of the

    channel bandwidth Ring topology inefficient for short packets

    Ring Router Microarchitecture

  • 21 NoCArc09

    0

    0.5

    1

    1.5

    2

    2.5

    163664163664163664163664

    2 4 8 16

    Nor

    mal

    ized

    runt

    ime

    0

    0.5

    1

    1.5

    2

    2.5

    16 36 64 16 36 64 16 36 64 16 36 64

    2 4 8 16

    Nor

    mal

    ized

    runt

    ime

    ring

    mesh

    Bandwidth Fragmentation

    Ring Router Microarchitecture

    bimodal pkts single flits pkts

  • 22 NoCArc09

    Limitations of this study

    Packaging of on-chip network topology = 2D layout of the topology

    Layout of topology can impact the performance 2D mesh : only require communicating with neighbors Ring : long links can be needed as network scale

    Hierarchical rings not investigated.

    Router complexity (for mesh) not properly modeled.

    Ring Router Microarchitecture

  • 23 NoCArc09 Ring Router Microarchitecture

    Summary

    On-chip networks presents different constraints compared to off-chip networks can exploit different router microarchitecture.

    Ring topology presents a simple topology and bufferless router microarchitecture can be implemented.

    Lightweight router microarchitecture proposed to increase performance with minimal additional complexity.

    Ring topology can scale but because of bandwidth fragmentation, can be limited in scalability especially high traffic.

    Can we scale this router microarchitecture to 2D mesh topology?

  • 24 NoCArc09

    Low-Cost Router Microarchitecture (Micro09)

    Ring Router Microarchitecture

  • 25 NoCArc09 Ring Router Microarchitecture

    Thank you

    Questions?

Search related