Three-Dimensional Layout of On-Chip Tree-Based Networks

Three-Dimensional Layout of On-Chip Tree-Based

Networks

Hiroki Matsutani (Keio Univ, Japan)Michihiro Koibuchi (NII, Japan)D. Frank Hsu (Fordham Univ, USA)Hideharu Amano (Keio Univ, Japan)

http://en.wikipedia.org/wiki/Image:Keio-logo.png

Outline• Introduction

– Network-on-Chip (NoC)– 2-D vs. 3-D

• Fat Tree– 2-D layout– 3-D layout

• Fat H-Tree– 2-D layout– 3-D layout

• Evaluations– Area, Wire length, Energy

[Matsutani, IPDPS’07]

Network-on-Chip (NoC)• Tile architectures

– MIT RAW

– Texas U. TRIPS

– Intel 80-tile NoC

• Various topologies– Mesh, Torus– Fat Trees– Fat H-Tree (FHT)

[Vangal, ISSCC’07]

[Buger, Computer’04]

[Taylor, Micro’02]

16-core Tile architecture

Tile (core & router)

Packet switched network on a chip

We proposed FHT as an alternative to Fat Trees[Matsutani, IPDPS’07]

2D Topologies: Mesh & Torus

Router Core

• 2-D Mesh • 2-D Torus– 2x bandwidth of

mesh

RAW [Taylor, IEEE Micro’02]

2D Topologies: Fat Tree

• Fat Tree (p, q, c)p: # of upward linksq: # of downward

linksc: # of core ports

Router Core

Fat Tree (2,4,2)Fat Tree (2,4,1)In this talk, we focus on 3-D layout scheme

of tree-based topologiesIn this talk, we focus on 3-D layout scheme

of tree-based topologies

Rank-1

Rank-2

2D NoC vs. 3D NoC• 2D NoCs

– Long wires (esp. trees)

– Wire delay– Packets consume

power at links according to their wire length

• 3D NoCs– Several small wafers

or dices are stacked

• Vertical link– Micro bump

– Through-wafer via

– Very short (10-50um)

[Ezaki, ISSCC’04]

[Burns, ISSCC’01]

Long horizontal wires in 2D NoCs can be replaced by very short vertical links in 3D

NoCs

Long horizontal wires in 2D NoCs can be replaced by very short vertical links in 3D

NoCs

Next slides show the 3D layout scheme of Fat Tree and FHT







Fat Tree: 2-D layout

• Fat Tree (p, q, c)p: # of upward linksq: # of downward

linksc: # of core ports

Router Core

Fat Tree (2,4,2)Fat Tree (2,4,1)

We preliminarily show the 3D layout scheme of Fat Trees

Fat Tree: 3-D layout (4-split)

• 2-D coordinates • 3-D coordinates

Original 2-D layout

),( 22 DD YX ),,( 333 DDD ZYX

transformation

Dividing into 4 layers

Top-rank routers are distributed to each layer

Layer-0 Layer-1

Layer-2 Layer-3

Original 2-D layout

Fat Tree: 3-D layout (4-split)

Top-rank links are replaced with vertical

interconnects (10-50um)

• 2-D coordinates • 3-D coordinates),( 22 DD YX ),,( 333 DDD ZYX

transformation

3-D layout (4-stacked)This 3-D layout is evaluated in terms of area, wire, & energy

Layer-0





• Evaluations– Area, Wire length, Power


Fat H-Tree: Structure

• Fat H-Tree– Red Tree (H-Tree)– Black Tree (H-Tree)


Combining two H-Trees (red & black)

Router Core Router Core

Location of black tree is shifted lower-right direction of red tree

By shifting the location of black tree, the connection pattern of trees is different from the original Fat Trees






Fat H-Tree is formed on red & black trees


















Rank-2 or upper routers are omitted in this figure

Each core is connected to

both red & black trees

Ring is formed with cores & rank1

routers

Torus-level performance by combing only two H-Trees

Fat H-Tree: 2-D layout on VLSI

• Fat H-Tree– Torus structure Folded as well as the folded layout of 2-D Torus

Fat H-Tree’s 2-D layoutRouter Core

Topologically equivalent

(Long feedback links across the chip)


The next slides propose the 3D layout scheme of Fat H-Tree

Fat H-Tree: 3-D layout (overview)

• Fat H-Tree– (Problem) Fat H-Tree has a torus structure– Folding so as to keep the torus structure

(step 1) fold it horizontally

(step 2) fold it vertically

consisting of red & black trees

Until the # of folded pieces meets the # of layers the 3-D IC has

E.g., four layers fold twice









Here we show the 3D layouts of red & black trees separately







Fat H-Tree: 3-D (Red tree; 4-split)


transformation

Original 2-D layout 3-D layout (4-stacked)

Layer-0 Layer-1

Layer-2 Layer-3

Fat H-Tree: 3-D (Red tree; 4-split)


transformation




Layer-0

Fat H-Tree: 3-D (Black tree;4-split)



transformation

Layer-0 Layer-1

Layer-2 Layer-3

They can be connected via only a vertical link



The periphery cores are connected to different layers


transformation



transformation




The periphery cores are connected to different layers

Layer-0

Fat H-Tree: 3-D layout (4-split)

Red tree (3-D)

Layer-0 Layer-0

Black tree (3-D) Fat H-Tree (3-D)

Layer-0

The 3-D layout of Fat H-Tree can be formed by superimposing 3-D layouts of red & black

trees







Evaluations: 2-D vs. 3-D

• 2-D layout– 64-core

• 3-D layout– 16-core x 4-layer– Vertical

interconnects

L mm

L/2 mm

Network logic area: # of routers

N= N=16 N=64 N=256

FT1 6 28 120

FT2 12 56 240

FHT 10 42 170

3Dmesh 16 64 256

3Dtorus 16 64 256

# of routers & their ports in trees are less than mesh/torus

N

nn 22

• 3-D mesh/torus: node degree 7

• Fat H-Tree: node degree 5

• Fat Tree (2,4,2): node degree 6

2/)24( nn nn 24

3/)14(2 n

N

FT1: Fat tree(2,4,1) FT2: Fat tree(2,4,2) FHT: Fat H-Tree

Network logic area: 2-D vs. 3-D

[Davis, DToC’05]

• Wormhole router– 1-flit = 64-bit– 3-stage pipeline

• Network interface– FIFO buffer– Packet forwarding

(Fat H-Tree only)

• Inter-wafer via– 1-10um square– 100um per layer

per 1-bit signal

2

Inter-wafer via area is calculated according to # of vertical links

Inter-wafer via area is calculated according to # of vertical links

• Network logic area– Routers, NIs– Inter-wafer vias

Arbiter

5x5 XBAR

FIFO

FIFO

Typical wormhole router

Synthesized with a 90nm CMOS

[Matsutani, ASPDAC’08]

Network logic area: Overhead of 3D

Synthesis result of 64-core (16-core x 4)

FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2) FHT: Fat H-Tree3D layout of trees area overheat is modest (at most 7.8%)

3D torus

2D torus

Inter-wafer via area (+7.8%)

Total wire length of all links

• Total unit-length of links– Core router– Router router

1-unit link

1-unit link

How many unit-links is required ?

1-unit = distance between neighboring cores

Total wire length of all links

FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2) FHT: Fat H-Tree

N= N=16 N=64 N=256

2D FT1 32 192 1,024

2D FT2 64 384 2,048

2D FHT 72 392 1,800

2Dmesh 24 112 480

2Dtorus 48 224 960

nn 22

nN

nN2)

2

12(88

1

1

n

n

N

)2(2 nN

)2(4 nN

1-unit

Total wire length of all links N= N=16 N=64 N=256

2D FT1 32 192 1,024

2D FT2 64 384 2,048

2D FHT 72 392 1,800

2Dmesh 24 112 480

2Dtorus 48 224 960

nn 22 1-unitnN

nN2)

2

12(88

1

1

n

n

N

)2(2 nN

)2(4 nN

N= N=16 N=64 N=256

3D FT1 16 128 768

3D FT2 32 256 1,536

3D FHT 40 200 904

3Dmesh 16 96 448

3Dtorus 32 192 896

)2

12(48

1

1

n

n

N

Nn )1(

Nn )1(2

)2(4 1 nN

)2(2 1 nN

1-unit4-stacked

FT1: Fat Tree(2,4,1) FT2: Fat Tree(2,4,2) FHT: Fat H-TreeWire length of trees is reduced by 25%-50% (close to torus)

nn 22

Energy: NoC’s energy model

• Ave. flit energy– Send 1-flit to dest.– How much

energy[J] ?

• Parameters– 8mm square chip– 64-core (16-core x 4)– 90nm CMOS

• Switching energy– 1-bit switching @

Router– Gate-level sim– 0.183 [pJ / hop]

• Link energy– 1-bit transfer @ Link– 0.150 [pJ / mm]

• Via energy– 4.34 [fF / via]

flitE

swE

linkE

)( linkswaveflit EEHwE

8mm

[Davis, DToC’05]

Energy: Reduction by going 3D

Frequent use of longest

links

Short hop count less

energy


2-D layout


2-D layout 3-D layout

Moving distance of packets is reduced

The 3D layout of trees reduces the energy by 30.8%-42.9%


Summary: 3-D layout of trees

• Drawbacks of on-chip tree-based topologies– Long links around the root of tree– Wire delay problem– Repeater insertion additional energy

consumption

• 3-D layout schemes of Fat Trees & Fat H-Tree– Wire length is reduced by 25%-50%– Area overhead is at most 7.8%– Flit transmission energy is reduced by 30.8%-42.9%

Need to consider negative impacts of 3-D (cost,heat,yield…)

In addition, energy-hungry repeater buffers can be removed

Thank you for your attention

Backup slides


2-D layout (w/o repeaters)

2-D layout (with repeaters)

(*) Repeater insertion model:

N. Weste et.al, “CMOS VLSI Design (3rd ed)”, 2005.

(*)

Energy is increased


Documents

Three-Dimensional Layout of On-Chip Tree-Based Networks