Upload
hugh-terry
View
223
Download
3
Embed Size (px)
Citation preview
10/03/2005: 1
Physical Synthesis of Latency Aware Low Power NoC Through Topology Exploration and Wire
Style Optimization
CK ChengCSE Department
UC San Diego
10/03/2005: 2
Related Work
• How to map processing cores to a mesh-based NoC? [J. Hu, ASP-DAC03] [S. Murali, DATE04]– Minimize power consumption– Minimize average communication delay
• What is the best energy-efficient NoC topologies under different network sizes and technology nodes? [H.Wang, DATE05]
• Our work differs from above work in that:– Physical constraints aware when implementing NoC– Wire style optimization– Latency aware synthesis
10/03/2005: 3
Motivation
• Topologies are important• Different wires have their own comparative
advantages• Topologies/Wiring Styles co-design to improve
powerWire styles Bit energy Routing
areaLatency
RC repeated wire with minimum
spacing
High Smallest Slowest
RC repeated wire with large spacing
Medium Medium Medium
Transmission line High initial power, Low for
long wire
Largest Fastest
10/03/2005: 4
Our Focuses
Physical Synthesis
Router
Processing Core
On-chip transmission line for long distance communication
Well spaced RC wire with buffer insertion for local connections
• NoC Power optimization through– Topology selection– Wire style optimization
• Communication latency aware low power synthesis
10/03/2005: 5
Design Flow
Power & Delay Lib
Topology Lib
Multi-commodity network flow (MCF)
formulation
Power Evaluation(MCF solver)
Latency Aware low power NoC topology
with wire style optimization
10/03/2005: 6
Topology Library
NoC size 4x4 5x5 6x6 7x7 8x8
#topologies+placement 36 254 534 1306 2092
• We studies topologies which have identical row and column connections (Degree <= 3)
• Cover most of popular topologies such as mesh, torus, hypercube, octagon, twisted cube, etc.
10/03/2005: 7
Power and Delay Lib: Routers
• Orion simulator• 0.18um technology
node, Vdd = 1.8v• 1GHz frequency, 4-
flit buffer size, 128-bit flit size
10/03/2005: 8
Power and Delay Lib: Wires
• Wires– unit wire length (2mm) min global pitch = 1.44um– delay of RC wires are proportional to wire Power and length– Power and delay of T-line have setup cost: P(setup) =
4.4pJ/bit, D(setup) = 50ps
10/03/2005: 9
Experiment Results
10/03/2005: 10
(1) Wire Style Optimization
• 4x4 torus under various evenly distributed comm. demands
10/03/2005: 11
(1) Wire Style Optimization
Comm. Demand (Gb/s)
Power (w/o opt.) (W)
Power (w/ opt.) (W)
Improvement (%)
10 43.0 28.1 34.6
15 62.5 41.9 33.0
20 82.0 58.4 28.8
25 102 79.1 22.1
30 121 103 14.6
35 141 128 9.19
40 (Max) 160 152 5.10• Power improvement diminishes as comm. demand increased• At the maximum comm. demand, still have space for wire style optimization
10/03/2005: 12
(2) Topology Selection (Power, Bandwidth)
• Optimal 4x4 topologies under various evenly distributed comm. demands
(a) Comm. demand = 1Gb/s (b) Comm. demand = 25Gb/s (c) Comm. demand = 40Gb/s
10/03/2005: 13
(2) Topology Selection (Power, Bandwidth)
(topology, demand(Gb/s
))
Total power (W)
Wire power (W)
Router power (W)
(mesh,1)(torus,1)(full, 1)
2.992.552.19
2.552.081.63
0.440.470.56
(mesh,25)(torus,25)(full,25)
81.274.776.9
70.162.863.1
11.111.913.8
(mesh,40)(torus,40)(full,40)
151.3152.0155.1
132.5132.5132.5
18.819.522.6
10/03/2005: 14
(2) Topology Selection (Latency, Power, BW)
48
58
68
78
88
98
2. 3 2. 8 3. 3 3. 8 4. 3
Average Latency (ns)
Powe
r Co
nsum
ptio
n (W
)
topo=opt i mal topo=mesh topo=torus topo=hypercube
• Comparison of optimal topology with mesh, torus and hypercube in terms of power and latency
10/03/2005: 15
(2) Topology Selection (Latency, Power, BW)
• Minimum (Power x Latency)
10/03/2005: 16
(a) Optimal topology when area = 3000um
(b) Optimal topology when area = 7000um
(c) Optimal topology when area = 11000um
(2) Topology Selection (Latency, Power, BW)
• Optimal 8x8 topologies under various on-chip area resources
10/03/2005: 17
(2) Topology Selection (Latency, Power, BW)
0
200
400
600
800
1000
1200
1400
1 2 3 4 5 6 7 8 9 10 11 12 13 14
# of hops
flow
amo
unt
mesh torus opt i mal
• Comparison of optimal topology with mesh and torus in terms of flow-hops
10/03/2005: 18
(3) Power and Latency Tradeoffs for Optimal 8x8 Topologies
48
53
58
63
68
73
2. 3 2. 5 2. 7 2. 9 3. 1 3. 3Average Latency (ns)
Powe
r Co
nsum
ptio
n (W
)
area=3000um area=4000um area=5000umarea=6000um area=7000um area=8000umarea=9000um area=10000um area=11000um
10/03/2005: 19
(4) Video Applications
10/03/2005: 20
(4) Video Applications
vu rastmed cpu
au
sdramidct, etc
sram2sram1
adsp riscbabup
samp
vld idctiquanrun le dec
inv scan
up samp
acdc pred
stripe mem
vop mem
armvop rec
pad
(a) NoC topology for MPEG4 (b) NoC topology for VOPD
inp mem
jug1vshs
inp mem
op disp
memjug2
in hvsmem1nr
vs jug2mem2hs
jug1 blendsemem3
(d) NoC topology for MWD(c) NoC topology for PIP
10/03/2005: 21
Conclusions
• Simultaneous optimization of topologies and wire styles.
• Improve power latency product by up to 52.1%, 29.4% and 35.6%, comparing with mesh, torus, and hypercube topologies, respectively.
• Cover most of classic direct network topologies, but extend far beyond them.