View
3
Download
0
Category
Preview:
Citation preview
UC San Diego / VLSI CAD Laboratory
Smart Non-Default Routing for Clock Power Reduction
Andrew B. Kahng, Seokhyeong Kang, Hyein Lee
VLSI CAD LABORATORY, UC San Diego
50th Design Automation Conference
June 5, 2013
-2-
Outline
Motivation
Our Wire Sizing Algorithm
Experimental Results
Conclusions and Future Works
-3-
Non-Default Routing Rule
Non-Default Routing Rules (NDRs) are used to increase wire widths and spacings – Reduce wire parasitic and delay variability
– Reduce coupling capacitance
– Avoid Electromigration (EM) violation
Default Rule
width
Non-Default Rule (NDR)
spacing
-4-
Electromigration
EM causes unwanted opens or shorts in wires
EM reliability ↔ reduced current density – Widen wires (use NDR)
Can NDRs always cure the EM problems?
-5-
Motivation Larger Cap.
Larger/More Buffers
More EM Violations
Fixed NDR (Wider wires)
More power
Fixed NDR
Smart NDR (SNDR)
Smaller/Fewer Buffers
Smaller Cap.
Less EM Violations
Smart NDR (Tapering)
Less power
No need for segment to be wide if small # of downstream sinks
-6-
Related Works
Wire sizing in clock trees – [Tsai04] buffer insertion and wire sizing to optimize
delay and power using dynamic programming
– [Guthaus06] clock buffer/wire sizing to minimize skew using sequential linear programming
– Related to timing; EM reliability not considered
EM-constrained wire sizing – [Pullela95] low-power clock tree with EM constraints
– Inserts buffers to reduce wire width
Our work considers EM reliability without buffer insertion
-7-
Outline
Motivation
Our Wire Sizing Algorithm
Experimental Results
Conclusions and Future Works
-8-
SNDR Wire Sizing Algorithm
Objective: Minimize the total capacitance of clock network
While maintaining – Clock latency
– Maximum transition time
– Clock skew
– Without EM violations
Solution: NDR for each wire segment of a given clock network
-9-
Wire Delay, Slew, RC and EM Limit Models
Wire delay and slew model
– Wire delay: Elmore delay model [Elmore48]
– Wire slew: PERI model [HuAHK07]
Wire RC and EM limits: f(w)
𝑅 ∝ 1/𝑤 𝐶 ∝ 𝑤
R, C
= linear functions of log(w) 𝐸𝑀 ∝ 𝑤
y = 0.6681x + 0.5475
R² = 0.9723
0.00
0.50
1.00
1.50
2.00
0.0 0.5 1.0 1.5E
M L
imit
(Ir
ms,m
A)
SNDR variable xe = ln(we)
y = -0.9814x + 1.7098
R² = 0.9674
0.00
0.50
1.00
1.50
2.00
0.0 0.5 1.0 1.5
Wir
e R
es.
pe
r u
m
(oh
m/u
m)
SNDR variable xe = ln(we)
y = 0.0319x + 0.0866
R² = 0.9439
0.00
0.05
0.10
0.15
0.0 0.5 1.0 1.5
Wir
e C
ap
. p
er
um
(Ff/
um
)
SNDR variable xe = ln(we)
w : wire width
<Resistance> <Capacitance> <EM limit>
, EMlimit
-10-
SNDR for a Clock Subnet
Problem formulation
NDR solutions are obtained for wire segments
Minimize total capacitance Subject to
Delayi < MaxDelay Skewi,j < MaxClockSkew Ie < MaxEMCurrentLimite
we ≥ wdesc(e) (desc(e) ≡ edges downstream from e)
...
<subnet>
e
i j
-11-
SNDR for Entire Clock Tree
Solve SNDR subnet problems from the bottom to top of clock tree
Skew propagation : Maximum/minimum clock latencies of downstream subnets are propagated to upstream subnets
level N
level 1
level N-1
<Entire Clock Tree>
... Max/min clock latencies at level N
Skew propagation
Skew constraints at level N-1
-12-
Iterative Linear Programming
Sizing problem is a quadratic program due to RC delay
Separate sizing problem into two linear programs
5X-30X runtime reduction by avoiding quadratic formulation
170 minutes vs. 30 minutes for dma testcase
Elmore delay = R·C
fR(xe)·fC(K)
fR(K)·fC(xe) fR(xe)·fC(xe)
Quadratic program Two linear programs
fR(xe)·fC(K)
fR(K)·fC(xe)
Iterative linear program
Iteratively solve the problems until the solution (xe) converges
and solve them iteratively
-13-
Outline
Motivation
Our Wire Sizing Algorithm
Experimental Results
Conclusions and Future Works
-14-
Implementation Flow
Practical and automated flow
Clock Tree Synthesis with NDR
DEFOut
DEFIn
Meet skew/slew/delay/EM?
Tapered Clock Tree
No
Yes
Clock Tree with SNDR
Solve SNDR Problems
Recovery with greedy algorithm
Solution
Original Clock Tree
-15-
Experimental Environments
Cadence SOC Encounter, Matlab, Synopsys 32/28nm PDK
Various NDRs are tried
NDR Width Space Norm. Cap.
1W2S 1 2 1.21
1W5S 1 5 0.73
2W4S 2 4 0.89
1W8S 1 8 0.66
2W7S 2 7 0.76
3W6S 3 6 0.88
4W5S 4 5 1.00
Iso area : W+S is fixed
Experiments
Iso area NDRs
Non-iso (less) area NDRs
Results essentially satisfy all timing and EM constraints
Runtime: 10 seconds ∼100 minutes per subnet
Matlab R2012b and 2.5GHz Intel Xeon processor
-16-
Results: Proportion of NDRs
Smaller-width NDRs replace ≥ 80% of the wiring
0% 20% 40% 60% 80% 100%
aes
eth
jpeg_enc
mc
mpeg2
tv80s
usbf
conmax
dma
1W8S
2W7S
3W6S
4W5S
-17-
Results: Less Capacitance
Clock switching power and wire capacitance reduction
0.0%
5.0%
10.0%
15.0%
Re
du
cti
on
[%
]
Wire Cap.
Clock Switching Pwr
0.0%2.0%4.0%6.0%8.0%
Re
du
cti
on
[%
]
Wire Cap.
Clock Switching Pwr
Orig. NDR = 4W5S
Orig. NDR = 2W4S
9% reduction in wire capacitance 5% reduction in clock switching power
5% reduction in wire capacitance 2% reduction in clock switching power
-18-
Results: Less Area
50% track cost reduction can be achieved by non-iso area NDRs
0
10
20
30
40
50
60
Tra
ck
co
st
red
ucti
on
[%
]
NDR Track cost
1W2S 3
2W4S 5
3W6S 7
4W5S 7
Track cost : total amount of track length occupied
-19-
Conclusion and Future Works Smart NDRs for clock networks reduce the
clock tree wire capacitance under timing and EM constraints
Less capacitance: 9% reduction of wire cap, 5% reduction of clock switching power
Less area: 50% track cost reduction by using non-iso area NDRs
Future work – Control local skews for improved chip timing and
robustness
– Use better delay models for problem formulation
– Noise and variability consideration
Thank You!
-21-
References
[Elmore48] W. C. Elmore, “The Transient Response of Damped Linear Networks with Particular Regard to Wideband Amplifiers”, J. Applied Physics 19(1) (1948), pp. 55-63
[HuAHK07] S. Hu, C. J. Alpert, J. Hu, S. Karandikar, Z. Li, W. Shi and C. N. Sze, “Fast Algorithms For Slew Constrained Minimum Cost Buffering”, IEEE TCAD 26(11) (2007), pp. 2009-2022.
[PullelaMP95] S. Pullela, N. Menezes and L. T. Pillage, “Low Power IC Clock Tree Design”, Proc. CICC, 1995, pp. 263-266.
Recommended