Upload
meredith-hampton
View
221
Download
4
Tags:
Embed Size (px)
Citation preview
1ISVLSI-2014 invited talk 140710
Toward Holistic Modeling Margining and Tolerance of IC
Variability
Andrew B KahngUCSD CSE and ECE Departments
abkucsdeduhttpvlsicaducsdedu
2ISVLSI-2014 invited talk 140710
IC Variability
bull In manufacturing processbull FEOLbull BEOL
bull During operationbull Voltagebull Temperature
bull Across lifetimebull Agingbull Breakdown
3ISVLSI-2014 invited talk 140710
Challenge Value of TechnologyD
esig
n q
ual
ity
(eg
f
req
uen
cy)
Technology generation
Design with margins
Margin lost benefits of technology
margin Lost benefits Nom
inal
Sc
alin
g
4ISVLSI-2014 invited talk 140710
Solutions Modeling Margining Tolerance
Solutions Modeling Margining Tolerance
BEOL Corner Optimization radic
Process-Aware Vdd Scaling radic
BTI EM-AVS Interactions radic
Overdrive Signoff radic
Min Cost of Resilience radic
bull Holistic mitigation of variability spans models margins tolerance mechanismsbull Signoff criteria monitors adaptivityresilience approximate
computing hellip
5ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
6ISVLSI-2014 invited talk 140710
BEOL Corner Optimization
bull 20nm and below increased timing variation due to interconnect R Cbull Design closure becomes much more difficult
bull Costs of BEOL variationsbull More design effort (eg ldquolast monthrdquo of manual ECO iteration)
bull Compromised circuit performance at high Vdd
bull Recent work reduce signoff margin by using tightened BEOL corners without sacrificing parametric yieldbull Signoff at conventional BEOL corners is pessimistic for most timing-
critical pathsbull We identify paths which can be safely signed off using tightened
BEOL corners (TBC)bull Joint work with Sorin Dobre (Qualcomm) and Tuck-Boon Chan
7ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
Routed design
Timing analysis using conventional BEOL corners (CBC)
ECOusing CBC
violation = 0
done
Conventional Signoff
No
Routed design
Classify timing critical paths
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
This work
NoNo
8ISVLSI-2014 invited talk 140710
Conventional BEOL Corners
bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)
bull Homogeneous corners all variation sources are skewed in the same direction
bull BEOL RC variations are modeled in interconnect technology file (itf)
M2
M3
M1
S2 W2T2
H2 Inter-layer dielectric
Inter-metal dielectric
H3
H1
T1
T3
ΔW ΔT ΔH
Ytyp typical typical Typical
Ycb min min max
Ycw max max min
Yrcb max max max
Yrcw min min min
9ISVLSI-2014 invited talk 140710
Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH
bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27
bull BEOL layers in the same process module use the same manufacturing equipment and process steps
bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)
bull zu and zv are in the same process module
M2 z4 z5 z6
M4 z10 z11 z12
M3 z7 z8 z9
M5 z13 z14 z15
M6 z16 z17 z18
M7 z19 z20 z21
M8 z22 z23 z24
M9 z25 z26 z27
M1 z1 z2 z3
Process module 3
Process module 2
Process module 1
Examples bull ΔW in layer M4 has a
positive correlation with ΔW in layers M5 M6 and M7
bull But ΔW in layer M4 is not correlated with ΔT in M4
ΔW ΔT ΔH
10ISVLSI-2014 invited talk 140710
Pessimism of Conventional BEOL Corners (CBC)
bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj
dj(YCBC) ge 3σj + dj(Ytyp)
bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)
Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb
bull Small αj large pessimism of CBC
delay-3σ
dj(YCBC) - dj(Ytyp)3σj
Large pessimism
11ISVLSI-2014 invited talk 140710
Intuition on Delay Variability Across Cw RCw
α α
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]
recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]
19ISVLSI-2014 invited talk 140710
Tradeoff Resilience Cost vs Datapath Cost
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
fanin coneD Q
error
D Q
error
D Q
error
Razor FF
error
normal FFQ
QSET
CLR
D
endpoint Razor FF
optimize fanin cone w tighter constraint
normal FF
area (power) of fanin cone
area (power) w Razor overhead
Razor FFs (resilience cost)
Powerarea of fanin circuits
Tradeoff
8
9
10
11
12
0
1
2
3
4Total energy
Energy of non-resilient part
Resilience cost
Razor FFs
En
erg
y (
mJ
)
300 100 50 0
We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
2ISVLSI-2014 invited talk 140710
IC Variability
bull In manufacturing processbull FEOLbull BEOL
bull During operationbull Voltagebull Temperature
bull Across lifetimebull Agingbull Breakdown
3ISVLSI-2014 invited talk 140710
Challenge Value of TechnologyD
esig
n q
ual
ity
(eg
f
req
uen
cy)
Technology generation
Design with margins
Margin lost benefits of technology
margin Lost benefits Nom
inal
Sc
alin
g
4ISVLSI-2014 invited talk 140710
Solutions Modeling Margining Tolerance
Solutions Modeling Margining Tolerance
BEOL Corner Optimization radic
Process-Aware Vdd Scaling radic
BTI EM-AVS Interactions radic
Overdrive Signoff radic
Min Cost of Resilience radic
bull Holistic mitigation of variability spans models margins tolerance mechanismsbull Signoff criteria monitors adaptivityresilience approximate
computing hellip
5ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
6ISVLSI-2014 invited talk 140710
BEOL Corner Optimization
bull 20nm and below increased timing variation due to interconnect R Cbull Design closure becomes much more difficult
bull Costs of BEOL variationsbull More design effort (eg ldquolast monthrdquo of manual ECO iteration)
bull Compromised circuit performance at high Vdd
bull Recent work reduce signoff margin by using tightened BEOL corners without sacrificing parametric yieldbull Signoff at conventional BEOL corners is pessimistic for most timing-
critical pathsbull We identify paths which can be safely signed off using tightened
BEOL corners (TBC)bull Joint work with Sorin Dobre (Qualcomm) and Tuck-Boon Chan
7ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
Routed design
Timing analysis using conventional BEOL corners (CBC)
ECOusing CBC
violation = 0
done
Conventional Signoff
No
Routed design
Classify timing critical paths
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
This work
NoNo
8ISVLSI-2014 invited talk 140710
Conventional BEOL Corners
bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)
bull Homogeneous corners all variation sources are skewed in the same direction
bull BEOL RC variations are modeled in interconnect technology file (itf)
M2
M3
M1
S2 W2T2
H2 Inter-layer dielectric
Inter-metal dielectric
H3
H1
T1
T3
ΔW ΔT ΔH
Ytyp typical typical Typical
Ycb min min max
Ycw max max min
Yrcb max max max
Yrcw min min min
9ISVLSI-2014 invited talk 140710
Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH
bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27
bull BEOL layers in the same process module use the same manufacturing equipment and process steps
bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)
bull zu and zv are in the same process module
M2 z4 z5 z6
M4 z10 z11 z12
M3 z7 z8 z9
M5 z13 z14 z15
M6 z16 z17 z18
M7 z19 z20 z21
M8 z22 z23 z24
M9 z25 z26 z27
M1 z1 z2 z3
Process module 3
Process module 2
Process module 1
Examples bull ΔW in layer M4 has a
positive correlation with ΔW in layers M5 M6 and M7
bull But ΔW in layer M4 is not correlated with ΔT in M4
ΔW ΔT ΔH
10ISVLSI-2014 invited talk 140710
Pessimism of Conventional BEOL Corners (CBC)
bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj
dj(YCBC) ge 3σj + dj(Ytyp)
bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)
Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb
bull Small αj large pessimism of CBC
delay-3σ
dj(YCBC) - dj(Ytyp)3σj
Large pessimism
11ISVLSI-2014 invited talk 140710
Intuition on Delay Variability Across Cw RCw
α α
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]
recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]
19ISVLSI-2014 invited talk 140710
Tradeoff Resilience Cost vs Datapath Cost
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
fanin coneD Q
error
D Q
error
D Q
error
Razor FF
error
normal FFQ
QSET
CLR
D
endpoint Razor FF
optimize fanin cone w tighter constraint
normal FF
area (power) of fanin cone
area (power) w Razor overhead
Razor FFs (resilience cost)
Powerarea of fanin circuits
Tradeoff
8
9
10
11
12
0
1
2
3
4Total energy
Energy of non-resilient part
Resilience cost
Razor FFs
En
erg
y (
mJ
)
300 100 50 0
We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
3ISVLSI-2014 invited talk 140710
Challenge Value of TechnologyD
esig
n q
ual
ity
(eg
f
req
uen
cy)
Technology generation
Design with margins
Margin lost benefits of technology
margin Lost benefits Nom
inal
Sc
alin
g
4ISVLSI-2014 invited talk 140710
Solutions Modeling Margining Tolerance
Solutions Modeling Margining Tolerance
BEOL Corner Optimization radic
Process-Aware Vdd Scaling radic
BTI EM-AVS Interactions radic
Overdrive Signoff radic
Min Cost of Resilience radic
bull Holistic mitigation of variability spans models margins tolerance mechanismsbull Signoff criteria monitors adaptivityresilience approximate
computing hellip
5ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
6ISVLSI-2014 invited talk 140710
BEOL Corner Optimization
bull 20nm and below increased timing variation due to interconnect R Cbull Design closure becomes much more difficult
bull Costs of BEOL variationsbull More design effort (eg ldquolast monthrdquo of manual ECO iteration)
bull Compromised circuit performance at high Vdd
bull Recent work reduce signoff margin by using tightened BEOL corners without sacrificing parametric yieldbull Signoff at conventional BEOL corners is pessimistic for most timing-
critical pathsbull We identify paths which can be safely signed off using tightened
BEOL corners (TBC)bull Joint work with Sorin Dobre (Qualcomm) and Tuck-Boon Chan
7ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
Routed design
Timing analysis using conventional BEOL corners (CBC)
ECOusing CBC
violation = 0
done
Conventional Signoff
No
Routed design
Classify timing critical paths
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
This work
NoNo
8ISVLSI-2014 invited talk 140710
Conventional BEOL Corners
bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)
bull Homogeneous corners all variation sources are skewed in the same direction
bull BEOL RC variations are modeled in interconnect technology file (itf)
M2
M3
M1
S2 W2T2
H2 Inter-layer dielectric
Inter-metal dielectric
H3
H1
T1
T3
ΔW ΔT ΔH
Ytyp typical typical Typical
Ycb min min max
Ycw max max min
Yrcb max max max
Yrcw min min min
9ISVLSI-2014 invited talk 140710
Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH
bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27
bull BEOL layers in the same process module use the same manufacturing equipment and process steps
bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)
bull zu and zv are in the same process module
M2 z4 z5 z6
M4 z10 z11 z12
M3 z7 z8 z9
M5 z13 z14 z15
M6 z16 z17 z18
M7 z19 z20 z21
M8 z22 z23 z24
M9 z25 z26 z27
M1 z1 z2 z3
Process module 3
Process module 2
Process module 1
Examples bull ΔW in layer M4 has a
positive correlation with ΔW in layers M5 M6 and M7
bull But ΔW in layer M4 is not correlated with ΔT in M4
ΔW ΔT ΔH
10ISVLSI-2014 invited talk 140710
Pessimism of Conventional BEOL Corners (CBC)
bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj
dj(YCBC) ge 3σj + dj(Ytyp)
bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)
Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb
bull Small αj large pessimism of CBC
delay-3σ
dj(YCBC) - dj(Ytyp)3σj
Large pessimism
11ISVLSI-2014 invited talk 140710
Intuition on Delay Variability Across Cw RCw
α α
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]
recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]
19ISVLSI-2014 invited talk 140710
Tradeoff Resilience Cost vs Datapath Cost
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
fanin coneD Q
error
D Q
error
D Q
error
Razor FF
error
normal FFQ
QSET
CLR
D
endpoint Razor FF
optimize fanin cone w tighter constraint
normal FF
area (power) of fanin cone
area (power) w Razor overhead
Razor FFs (resilience cost)
Powerarea of fanin circuits
Tradeoff
8
9
10
11
12
0
1
2
3
4Total energy
Energy of non-resilient part
Resilience cost
Razor FFs
En
erg
y (
mJ
)
300 100 50 0
We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
4ISVLSI-2014 invited talk 140710
Solutions Modeling Margining Tolerance
Solutions Modeling Margining Tolerance
BEOL Corner Optimization radic
Process-Aware Vdd Scaling radic
BTI EM-AVS Interactions radic
Overdrive Signoff radic
Min Cost of Resilience radic
bull Holistic mitigation of variability spans models margins tolerance mechanismsbull Signoff criteria monitors adaptivityresilience approximate
computing hellip
5ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
6ISVLSI-2014 invited talk 140710
BEOL Corner Optimization
bull 20nm and below increased timing variation due to interconnect R Cbull Design closure becomes much more difficult
bull Costs of BEOL variationsbull More design effort (eg ldquolast monthrdquo of manual ECO iteration)
bull Compromised circuit performance at high Vdd
bull Recent work reduce signoff margin by using tightened BEOL corners without sacrificing parametric yieldbull Signoff at conventional BEOL corners is pessimistic for most timing-
critical pathsbull We identify paths which can be safely signed off using tightened
BEOL corners (TBC)bull Joint work with Sorin Dobre (Qualcomm) and Tuck-Boon Chan
7ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
Routed design
Timing analysis using conventional BEOL corners (CBC)
ECOusing CBC
violation = 0
done
Conventional Signoff
No
Routed design
Classify timing critical paths
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
This work
NoNo
8ISVLSI-2014 invited talk 140710
Conventional BEOL Corners
bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)
bull Homogeneous corners all variation sources are skewed in the same direction
bull BEOL RC variations are modeled in interconnect technology file (itf)
M2
M3
M1
S2 W2T2
H2 Inter-layer dielectric
Inter-metal dielectric
H3
H1
T1
T3
ΔW ΔT ΔH
Ytyp typical typical Typical
Ycb min min max
Ycw max max min
Yrcb max max max
Yrcw min min min
9ISVLSI-2014 invited talk 140710
Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH
bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27
bull BEOL layers in the same process module use the same manufacturing equipment and process steps
bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)
bull zu and zv are in the same process module
M2 z4 z5 z6
M4 z10 z11 z12
M3 z7 z8 z9
M5 z13 z14 z15
M6 z16 z17 z18
M7 z19 z20 z21
M8 z22 z23 z24
M9 z25 z26 z27
M1 z1 z2 z3
Process module 3
Process module 2
Process module 1
Examples bull ΔW in layer M4 has a
positive correlation with ΔW in layers M5 M6 and M7
bull But ΔW in layer M4 is not correlated with ΔT in M4
ΔW ΔT ΔH
10ISVLSI-2014 invited talk 140710
Pessimism of Conventional BEOL Corners (CBC)
bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj
dj(YCBC) ge 3σj + dj(Ytyp)
bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)
Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb
bull Small αj large pessimism of CBC
delay-3σ
dj(YCBC) - dj(Ytyp)3σj
Large pessimism
11ISVLSI-2014 invited talk 140710
Intuition on Delay Variability Across Cw RCw
α α
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]
recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]
19ISVLSI-2014 invited talk 140710
Tradeoff Resilience Cost vs Datapath Cost
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
fanin coneD Q
error
D Q
error
D Q
error
Razor FF
error
normal FFQ
QSET
CLR
D
endpoint Razor FF
optimize fanin cone w tighter constraint
normal FF
area (power) of fanin cone
area (power) w Razor overhead
Razor FFs (resilience cost)
Powerarea of fanin circuits
Tradeoff
8
9
10
11
12
0
1
2
3
4Total energy
Energy of non-resilient part
Resilience cost
Razor FFs
En
erg
y (
mJ
)
300 100 50 0
We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
5ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
6ISVLSI-2014 invited talk 140710
BEOL Corner Optimization
bull 20nm and below increased timing variation due to interconnect R Cbull Design closure becomes much more difficult
bull Costs of BEOL variationsbull More design effort (eg ldquolast monthrdquo of manual ECO iteration)
bull Compromised circuit performance at high Vdd
bull Recent work reduce signoff margin by using tightened BEOL corners without sacrificing parametric yieldbull Signoff at conventional BEOL corners is pessimistic for most timing-
critical pathsbull We identify paths which can be safely signed off using tightened
BEOL corners (TBC)bull Joint work with Sorin Dobre (Qualcomm) and Tuck-Boon Chan
7ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
Routed design
Timing analysis using conventional BEOL corners (CBC)
ECOusing CBC
violation = 0
done
Conventional Signoff
No
Routed design
Classify timing critical paths
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
This work
NoNo
8ISVLSI-2014 invited talk 140710
Conventional BEOL Corners
bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)
bull Homogeneous corners all variation sources are skewed in the same direction
bull BEOL RC variations are modeled in interconnect technology file (itf)
M2
M3
M1
S2 W2T2
H2 Inter-layer dielectric
Inter-metal dielectric
H3
H1
T1
T3
ΔW ΔT ΔH
Ytyp typical typical Typical
Ycb min min max
Ycw max max min
Yrcb max max max
Yrcw min min min
9ISVLSI-2014 invited talk 140710
Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH
bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27
bull BEOL layers in the same process module use the same manufacturing equipment and process steps
bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)
bull zu and zv are in the same process module
M2 z4 z5 z6
M4 z10 z11 z12
M3 z7 z8 z9
M5 z13 z14 z15
M6 z16 z17 z18
M7 z19 z20 z21
M8 z22 z23 z24
M9 z25 z26 z27
M1 z1 z2 z3
Process module 3
Process module 2
Process module 1
Examples bull ΔW in layer M4 has a
positive correlation with ΔW in layers M5 M6 and M7
bull But ΔW in layer M4 is not correlated with ΔT in M4
ΔW ΔT ΔH
10ISVLSI-2014 invited talk 140710
Pessimism of Conventional BEOL Corners (CBC)
bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj
dj(YCBC) ge 3σj + dj(Ytyp)
bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)
Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb
bull Small αj large pessimism of CBC
delay-3σ
dj(YCBC) - dj(Ytyp)3σj
Large pessimism
11ISVLSI-2014 invited talk 140710
Intuition on Delay Variability Across Cw RCw
α α
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]
recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]
19ISVLSI-2014 invited talk 140710
Tradeoff Resilience Cost vs Datapath Cost
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
fanin coneD Q
error
D Q
error
D Q
error
Razor FF
error
normal FFQ
QSET
CLR
D
endpoint Razor FF
optimize fanin cone w tighter constraint
normal FF
area (power) of fanin cone
area (power) w Razor overhead
Razor FFs (resilience cost)
Powerarea of fanin circuits
Tradeoff
8
9
10
11
12
0
1
2
3
4Total energy
Energy of non-resilient part
Resilience cost
Razor FFs
En
erg
y (
mJ
)
300 100 50 0
We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
6ISVLSI-2014 invited talk 140710
BEOL Corner Optimization
bull 20nm and below increased timing variation due to interconnect R Cbull Design closure becomes much more difficult
bull Costs of BEOL variationsbull More design effort (eg ldquolast monthrdquo of manual ECO iteration)
bull Compromised circuit performance at high Vdd
bull Recent work reduce signoff margin by using tightened BEOL corners without sacrificing parametric yieldbull Signoff at conventional BEOL corners is pessimistic for most timing-
critical pathsbull We identify paths which can be safely signed off using tightened
BEOL corners (TBC)bull Joint work with Sorin Dobre (Qualcomm) and Tuck-Boon Chan
7ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
Routed design
Timing analysis using conventional BEOL corners (CBC)
ECOusing CBC
violation = 0
done
Conventional Signoff
No
Routed design
Classify timing critical paths
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
This work
NoNo
8ISVLSI-2014 invited talk 140710
Conventional BEOL Corners
bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)
bull Homogeneous corners all variation sources are skewed in the same direction
bull BEOL RC variations are modeled in interconnect technology file (itf)
M2
M3
M1
S2 W2T2
H2 Inter-layer dielectric
Inter-metal dielectric
H3
H1
T1
T3
ΔW ΔT ΔH
Ytyp typical typical Typical
Ycb min min max
Ycw max max min
Yrcb max max max
Yrcw min min min
9ISVLSI-2014 invited talk 140710
Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH
bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27
bull BEOL layers in the same process module use the same manufacturing equipment and process steps
bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)
bull zu and zv are in the same process module
M2 z4 z5 z6
M4 z10 z11 z12
M3 z7 z8 z9
M5 z13 z14 z15
M6 z16 z17 z18
M7 z19 z20 z21
M8 z22 z23 z24
M9 z25 z26 z27
M1 z1 z2 z3
Process module 3
Process module 2
Process module 1
Examples bull ΔW in layer M4 has a
positive correlation with ΔW in layers M5 M6 and M7
bull But ΔW in layer M4 is not correlated with ΔT in M4
ΔW ΔT ΔH
10ISVLSI-2014 invited talk 140710
Pessimism of Conventional BEOL Corners (CBC)
bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj
dj(YCBC) ge 3σj + dj(Ytyp)
bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)
Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb
bull Small αj large pessimism of CBC
delay-3σ
dj(YCBC) - dj(Ytyp)3σj
Large pessimism
11ISVLSI-2014 invited talk 140710
Intuition on Delay Variability Across Cw RCw
α α
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]
recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]
19ISVLSI-2014 invited talk 140710
Tradeoff Resilience Cost vs Datapath Cost
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
fanin coneD Q
error
D Q
error
D Q
error
Razor FF
error
normal FFQ
QSET
CLR
D
endpoint Razor FF
optimize fanin cone w tighter constraint
normal FF
area (power) of fanin cone
area (power) w Razor overhead
Razor FFs (resilience cost)
Powerarea of fanin circuits
Tradeoff
8
9
10
11
12
0
1
2
3
4Total energy
Energy of non-resilient part
Resilience cost
Razor FFs
En
erg
y (
mJ
)
300 100 50 0
We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
7ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
Routed design
Timing analysis using conventional BEOL corners (CBC)
ECOusing CBC
violation = 0
done
Conventional Signoff
No
Routed design
Classify timing critical paths
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
This work
NoNo
8ISVLSI-2014 invited talk 140710
Conventional BEOL Corners
bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)
bull Homogeneous corners all variation sources are skewed in the same direction
bull BEOL RC variations are modeled in interconnect technology file (itf)
M2
M3
M1
S2 W2T2
H2 Inter-layer dielectric
Inter-metal dielectric
H3
H1
T1
T3
ΔW ΔT ΔH
Ytyp typical typical Typical
Ycb min min max
Ycw max max min
Yrcb max max max
Yrcw min min min
9ISVLSI-2014 invited talk 140710
Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH
bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27
bull BEOL layers in the same process module use the same manufacturing equipment and process steps
bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)
bull zu and zv are in the same process module
M2 z4 z5 z6
M4 z10 z11 z12
M3 z7 z8 z9
M5 z13 z14 z15
M6 z16 z17 z18
M7 z19 z20 z21
M8 z22 z23 z24
M9 z25 z26 z27
M1 z1 z2 z3
Process module 3
Process module 2
Process module 1
Examples bull ΔW in layer M4 has a
positive correlation with ΔW in layers M5 M6 and M7
bull But ΔW in layer M4 is not correlated with ΔT in M4
ΔW ΔT ΔH
10ISVLSI-2014 invited talk 140710
Pessimism of Conventional BEOL Corners (CBC)
bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj
dj(YCBC) ge 3σj + dj(Ytyp)
bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)
Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb
bull Small αj large pessimism of CBC
delay-3σ
dj(YCBC) - dj(Ytyp)3σj
Large pessimism
11ISVLSI-2014 invited talk 140710
Intuition on Delay Variability Across Cw RCw
α α
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]
recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]
19ISVLSI-2014 invited talk 140710
Tradeoff Resilience Cost vs Datapath Cost
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
fanin coneD Q
error
D Q
error
D Q
error
Razor FF
error
normal FFQ
QSET
CLR
D
endpoint Razor FF
optimize fanin cone w tighter constraint
normal FF
area (power) of fanin cone
area (power) w Razor overhead
Razor FFs (resilience cost)
Powerarea of fanin circuits
Tradeoff
8
9
10
11
12
0
1
2
3
4Total energy
Energy of non-resilient part
Resilience cost
Razor FFs
En
erg
y (
mJ
)
300 100 50 0
We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
8ISVLSI-2014 invited talk 140710
Conventional BEOL Corners
bull Three major variation sources per layer ΔW ΔT ΔHbull Conventional BEOL corners (CBC)
bull Homogeneous corners all variation sources are skewed in the same direction
bull BEOL RC variations are modeled in interconnect technology file (itf)
M2
M3
M1
S2 W2T2
H2 Inter-layer dielectric
Inter-metal dielectric
H3
H1
T1
T3
ΔW ΔT ΔH
Ytyp typical typical Typical
Ycb min min max
Ycw max max min
Yrcb max max max
Yrcw min min min
9ISVLSI-2014 invited talk 140710
Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH
bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27
bull BEOL layers in the same process module use the same manufacturing equipment and process steps
bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)
bull zu and zv are in the same process module
M2 z4 z5 z6
M4 z10 z11 z12
M3 z7 z8 z9
M5 z13 z14 z15
M6 z16 z17 z18
M7 z19 z20 z21
M8 z22 z23 z24
M9 z25 z26 z27
M1 z1 z2 z3
Process module 3
Process module 2
Process module 1
Examples bull ΔW in layer M4 has a
positive correlation with ΔW in layers M5 M6 and M7
bull But ΔW in layer M4 is not correlated with ΔT in M4
ΔW ΔT ΔH
10ISVLSI-2014 invited talk 140710
Pessimism of Conventional BEOL Corners (CBC)
bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj
dj(YCBC) ge 3σj + dj(Ytyp)
bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)
Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb
bull Small αj large pessimism of CBC
delay-3σ
dj(YCBC) - dj(Ytyp)3σj
Large pessimism
11ISVLSI-2014 invited talk 140710
Intuition on Delay Variability Across Cw RCw
α α
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]
recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]
19ISVLSI-2014 invited talk 140710
Tradeoff Resilience Cost vs Datapath Cost
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
fanin coneD Q
error
D Q
error
D Q
error
Razor FF
error
normal FFQ
QSET
CLR
D
endpoint Razor FF
optimize fanin cone w tighter constraint
normal FF
area (power) of fanin cone
area (power) w Razor overhead
Razor FFs (resilience cost)
Powerarea of fanin circuits
Tradeoff
8
9
10
11
12
0
1
2
3
4Total energy
Energy of non-resilient part
Resilience cost
Razor FFs
En
erg
y (
mJ
)
300 100 50 0
We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
9ISVLSI-2014 invited talk 140710
Statistical RC Modelbull 3 variation sources in each layer ΔW ΔT ΔH
bull 9-layer metal stack has 27 variation sources z1 z2 hellip z27
bull BEOL layers in the same process module use the same manufacturing equipment and process steps
bull zu and zv are correlated if and only ifbull zu and zv are the same type (ΔW ΔT or ΔH)
bull zu and zv are in the same process module
M2 z4 z5 z6
M4 z10 z11 z12
M3 z7 z8 z9
M5 z13 z14 z15
M6 z16 z17 z18
M7 z19 z20 z21
M8 z22 z23 z24
M9 z25 z26 z27
M1 z1 z2 z3
Process module 3
Process module 2
Process module 1
Examples bull ΔW in layer M4 has a
positive correlation with ΔW in layers M5 M6 and M7
bull But ΔW in layer M4 is not correlated with ΔT in M4
ΔW ΔT ΔH
10ISVLSI-2014 invited talk 140710
Pessimism of Conventional BEOL Corners (CBC)
bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj
dj(YCBC) ge 3σj + dj(Ytyp)
bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)
Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb
bull Small αj large pessimism of CBC
delay-3σ
dj(YCBC) - dj(Ytyp)3σj
Large pessimism
11ISVLSI-2014 invited talk 140710
Intuition on Delay Variability Across Cw RCw
α α
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]
recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]
19ISVLSI-2014 invited talk 140710
Tradeoff Resilience Cost vs Datapath Cost
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
fanin coneD Q
error
D Q
error
D Q
error
Razor FF
error
normal FFQ
QSET
CLR
D
endpoint Razor FF
optimize fanin cone w tighter constraint
normal FF
area (power) of fanin cone
area (power) w Razor overhead
Razor FFs (resilience cost)
Powerarea of fanin circuits
Tradeoff
8
9
10
11
12
0
1
2
3
4Total energy
Energy of non-resilient part
Resilience cost
Razor FFs
En
erg
y (
mJ
)
300 100 50 0
We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
10ISVLSI-2014 invited talk 140710
Pessimism of Conventional BEOL Corners (CBC)
bull Assumption a max (setup) path pj is ldquosaferdquo when delay evaluated at a given CBC is larger than nominal delay + 3σj
dj(YCBC) ge 3σj + dj(Ytyp)
bull For a given path we can compare the statistical delay variation and the delay obtained from a given CBC αj = 3σj Δdj(YCBC)
Δdj(YCBC)= [dj(YCBC) - dj(Ytyp)] YCBC Ycw Ycb Yrcw Yrcb
bull Small αj large pessimism of CBC
delay-3σ
dj(YCBC) - dj(Ytyp)3σj
Large pessimism
11ISVLSI-2014 invited talk 140710
Intuition on Delay Variability Across Cw RCw
α α
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]
recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]
19ISVLSI-2014 invited talk 140710
Tradeoff Resilience Cost vs Datapath Cost
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
fanin coneD Q
error
D Q
error
D Q
error
Razor FF
error
normal FFQ
QSET
CLR
D
endpoint Razor FF
optimize fanin cone w tighter constraint
normal FF
area (power) of fanin cone
area (power) w Razor overhead
Razor FFs (resilience cost)
Powerarea of fanin circuits
Tradeoff
8
9
10
11
12
0
1
2
3
4Total energy
Energy of non-resilient part
Resilience cost
Razor FFs
En
erg
y (
mJ
)
300 100 50 0
We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
11ISVLSI-2014 invited talk 140710
Intuition on Delay Variability Across Cw RCw
α α
Δdelay (vs typ) at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 here delay variations covered by RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay (vs typ) at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]
recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]
19ISVLSI-2014 invited talk 140710
Tradeoff Resilience Cost vs Datapath Cost
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
fanin coneD Q
error
D Q
error
D Q
error
Razor FF
error
normal FFQ
QSET
CLR
D
endpoint Razor FF
optimize fanin cone w tighter constraint
normal FF
area (power) of fanin cone
area (power) w Razor overhead
Razor FFs (resilience cost)
Powerarea of fanin circuits
Tradeoff
8
9
10
11
12
0
1
2
3
4Total energy
Energy of non-resilient part
Resilience cost
Razor FFs
En
erg
y (
mJ
)
300 100 50 0
We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
12ISVLSI-2014 invited talk 140710
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths often have smaller α values at the other corner ()
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variations
In the following α is defined at the dominant corner
Intuition on Delay Variability Across Cw RCw
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]
recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]
19ISVLSI-2014 invited talk 140710
Tradeoff Resilience Cost vs Datapath Cost
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
fanin coneD Q
error
D Q
error
D Q
error
Razor FF
error
normal FFQ
QSET
CLR
D
endpoint Razor FF
optimize fanin cone w tighter constraint
normal FF
area (power) of fanin cone
area (power) w Razor overhead
Razor FFs (resilience cost)
Powerarea of fanin circuits
Tradeoff
8
9
10
11
12
0
1
2
3
4Total energy
Energy of non-resilient part
Resilience cost
Razor FFs
En
erg
y (
mJ
)
300 100 50 0
We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
13ISVLSI-2014 invited talk 140710
Scaling Factor α and Delay Variationbull Paths with small Δdrcw and Δdcw have large α
bull Eg here we see αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp) α
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]
recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]
19ISVLSI-2014 invited talk 140710
Tradeoff Resilience Cost vs Datapath Cost
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
fanin coneD Q
error
D Q
error
D Q
error
Razor FF
error
normal FFQ
QSET
CLR
D
endpoint Razor FF
optimize fanin cone w tighter constraint
normal FF
area (power) of fanin cone
area (power) w Razor overhead
Razor FFs (resilience cost)
Powerarea of fanin circuits
Tradeoff
8
9
10
11
12
0
1
2
3
4Total energy
Energy of non-resilient part
Resilience cost
Razor FFs
En
erg
y (
mJ
)
300 100 50 0
We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
14ISVLSI-2014 invited talk 140710
bull Paths with small Δdrcw and Δdcw have large α
bull Eg there are αj gt 06 when ((Δdrcw lt 3) AND (Δdcw lt 3))
bull Identify paths for tightened BEOL corners based on Δdrcw and Δdcw
Find Paths for Which TBCs Can Be Used
Δd(Ycw)d(Ytyp)
Δd(Yrcw)d(Ytyp)
Acw
Arcw
Gtbc = Set of paths that can be safely signed off using TBC ( (Path with Δdcw larger than Acw) OR (Path with Δdrcw larger than Arcw) )
α
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]
recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]
19ISVLSI-2014 invited talk 140710
Tradeoff Resilience Cost vs Datapath Cost
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
fanin coneD Q
error
D Q
error
D Q
error
Razor FF
error
normal FFQ
QSET
CLR
D
endpoint Razor FF
optimize fanin cone w tighter constraint
normal FF
area (power) of fanin cone
area (power) w Razor overhead
Razor FFs (resilience cost)
Powerarea of fanin circuits
Tradeoff
8
9
10
11
12
0
1
2
3
4Total energy
Energy of non-resilient part
Resilience cost
Razor FFs
En
erg
y (
mJ
)
300 100 50 0
We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
15ISVLSI-2014 invited talk 140710
Determining α Arcw and Acw
Δd at C-worst corner ()Δd at RC-worst corner ()
bull Assumption critical paths in different designs have similar trends
bull Extract Arcw and Acw from a set of representative paths
bull Plot α vs Δdelay find Arcw and Acw for a given α
bull Add +1 margin on Arcw and Acw to account for sampling error
bull Smaller α larger thresholds (Arcw and Acw) fewer paths in GTBC
Δd at C-worst corner ()
Arcw Acw
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]
recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]
19ISVLSI-2014 invited talk 140710
Tradeoff Resilience Cost vs Datapath Cost
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
fanin coneD Q
error
D Q
error
D Q
error
Razor FF
error
normal FFQ
QSET
CLR
D
endpoint Razor FF
optimize fanin cone w tighter constraint
normal FF
area (power) of fanin cone
area (power) w Razor overhead
Razor FFs (resilience cost)
Powerarea of fanin circuits
Tradeoff
8
9
10
11
12
0
1
2
3
4Total energy
Energy of non-resilient part
Resilience cost
Razor FFs
En
erg
y (
mJ
)
300 100 50 0
We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
16ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners
bull WNS and TNS are reduced by up to 100ps and 53nsbull Timing violations reduced by
24 to 100
bull TBC-06 more benefits bull Tradeoff between reduced margin
vs paths which use TBC
Correlation factor γ = 05
LEON SUPERBLUE12 NETCARD
-018-016-014-012
-01-008-006-004-002
0
CBC TBC-05 TBC-06 TBC-07
WN
S (n
s)
LEON SUPERBLUE12 NETCARD
-90-80-70-60-50-40-30-20-10
0
CBC TBC-05 TBC-06 TBC-07
TNS
(ns)
LEON SUPERBLUE12 NETCARD0
200400600800
1000120014001600
CBC TBC-05 TBC-06 TBC-07
Tim
ing
viol
ation
s
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]
recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]
19ISVLSI-2014 invited talk 140710
Tradeoff Resilience Cost vs Datapath Cost
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
fanin coneD Q
error
D Q
error
D Q
error
Razor FF
error
normal FFQ
QSET
CLR
D
endpoint Razor FF
optimize fanin cone w tighter constraint
normal FF
area (power) of fanin cone
area (power) w Razor overhead
Razor FFs (resilience cost)
Powerarea of fanin circuits
Tradeoff
8
9
10
11
12
0
1
2
3
4Total energy
Energy of non-resilient part
Resilience cost
Razor FFs
En
erg
y (
mJ
)
300 100 50 0
We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
17ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]
recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]
19ISVLSI-2014 invited talk 140710
Tradeoff Resilience Cost vs Datapath Cost
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
fanin coneD Q
error
D Q
error
D Q
error
Razor FF
error
normal FFQ
QSET
CLR
D
endpoint Razor FF
optimize fanin cone w tighter constraint
normal FF
area (power) of fanin cone
area (power) w Razor overhead
Razor FFs (resilience cost)
Powerarea of fanin circuits
Tradeoff
8
9
10
11
12
0
1
2
3
4Total energy
Energy of non-resilient part
Resilience cost
Razor FFs
En
erg
y (
mJ
)
300 100 50 0
We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
18ISVLSI-2014 invited talk 140710
How to Minimize Cost of Resilience bull Additional circuits area and power penaltiesbull Recovery from errors throughput degradationbull Large hold margin short-path padding costbull Want benefits (eg energy) to maximally outweigh costs
Razor Razor-Lite TIMBER
Razor Razor-Lite TIMBER
Power penalty 30 [Das08] ~0 [Kim13] 100 [Choudhury09]
Area penalty 182 [Kim13] 33 [Kim13] 255 [Chen13]
recovery cycles 5 [Wan09] 11 [Kim13] 0 [Choudhury09]
19ISVLSI-2014 invited talk 140710
Tradeoff Resilience Cost vs Datapath Cost
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
fanin coneD Q
error
D Q
error
D Q
error
Razor FF
error
normal FFQ
QSET
CLR
D
endpoint Razor FF
optimize fanin cone w tighter constraint
normal FF
area (power) of fanin cone
area (power) w Razor overhead
Razor FFs (resilience cost)
Powerarea of fanin circuits
Tradeoff
8
9
10
11
12
0
1
2
3
4Total energy
Energy of non-resilient part
Resilience cost
Razor FFs
En
erg
y (
mJ
)
300 100 50 0
We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
19ISVLSI-2014 invited talk 140710
Tradeoff Resilience Cost vs Datapath Cost
Q
QSET
CLR
D
Q
QSET
CLR
D
Q
QSET
CLR
D
fanin coneD Q
error
D Q
error
D Q
error
Razor FF
error
normal FFQ
QSET
CLR
D
endpoint Razor FF
optimize fanin cone w tighter constraint
normal FF
area (power) of fanin cone
area (power) w Razor overhead
Razor FFs (resilience cost)
Powerarea of fanin circuits
Tradeoff
8
9
10
11
12
0
1
2
3
4Total energy
Energy of non-resilient part
Resilience cost
Razor FFs
En
erg
y (
mJ
)
300 100 50 0
We seek to minimize total energy via this tradeoff (joint work with Seokhyeong Kang and Jiajia Li extensions ongoing in collaboration with NXP)
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
20ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimization (SEOpt)bull Optimize fanin cone of an endpoint w tighter constraints
Allows replacement of Razor FF w normal FFbull Pick endpoints based on heuristic sensitivity functions
Vary endpoints compare areapower penalty119878119865 1=iquest 119904119897119886119888119896 (119901 )oriquest
119878119865 2=iquest119904119897119886119888119896 (119901)ortimes119899119906119898119888119903119894(119901)
119878119865 3=iquest119904119897119886119888119896 (119901 )ortimes 119899119906119898119888119903119894(119901 )119899119906119898119905119900119905119886119897 (119901)
119878119865 4=iquest119904119897119886119888119896 (119901)ortimes sum119888 120598 119891119886119899119894119899 (119901)
119875119908119903 (119888)
119878119865 5= sum119888 120598 119891119886119899119894119899 (119901)
iquest119904119897119886119888119896 (119888 )oriquesttimes119875119908119903 (119888)iquest
Candidate Sensitivity Functions
p negative slack endpointc cells within fanin coneNumcri number of negative slack cells
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
21ISVLSI-2014 invited talk 140710
Clock Skew Optimization (SkewOpt)bull Increase slacks on timing-critical andor frequently-
exercised paths1 Generate sequential graph
2 Find cycle of paths with minimum total weight adjust clock latencies contract the cycle into one vertex
3 Iterate Step 2 until all endpoints are optimized
FF1 FF2 FF3W12 W23
Clock
Data path Clock tree
W31
119882 119901119902=119878119897119886119888119896119901 119902
1+β times119879119866(119901 119902 )
Setup slack of path p-q
Weighting factor
Toggle rate of path p-q
Wrsquo
Wrsquo Wrsquo
Wrsquo = average weight on cycle
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
22ISVLSI-2014 invited talk 140710
Overall Optimization Flowbull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity aware clock skew optimization
SkewOpt
OR-tree insertion
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
23ISVLSI-2014 invited talk 140710
Benefit of Low-Cost Resiliencebull Reference flows
bull Pure-margin (PM) conventional method w only margin insertionbull Brute-force (BF) use error-tolerant FFs for timing-critical endpoints
bull Proposed method (CO) achieves up to 21 energy reduction compared to reference methods
bull Resilience benefits increase with larger process variation
PM BF CO PM BF CO PM BF CO27
29
31
33
35
37
En
erg
y (
mJ
)
PM BF CO PM BF CO PM BF CO22
26
30
34
38Energy penalty of throughput degradation
Energy penalty of additional circuits
Energy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin 1σ2σ3σ for SS corner Technology foundry 28nm
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
24ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience with AVSbull Adaptive voltage scaling allows a lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 17 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
086 09 094 098 10225
30
35
40
45
50pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
070 072 074 076 078 08024
26
28
30
32
34
36 pure-marginbrute-forceCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
Technology foundry 28nm
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
25ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Tolerance of IC Variabilitybull Margining of IC Variabilitybull Conclusions
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
26ISVLSI-2014 invited talk 140710
Breaking Chicken-Egg Loops Less Marginbull Example Interaction between reliability margin and AVS designs
bull Bias temperature instability (BTI) aging higher |ΔVth| lower fmax
bull AVS can be used to compensate for performance degradation
Circuit
Closed-loop AVS
On-chip aging
monitor
Circuit performanc
e
Voltage regulato
r
Circuit frequency
Vdd
time
time
Without AVSWith AVS
target
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
27ISVLSI-2014 invited talk 140710
Derated Library Characterization and AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib selection should consider BTI + AVS interaction
bull Aging and Vfinal are unknowns before circuit implementation
BTI degradation and AVS
Vfinal
VBTI |Vt|
Step 1
Vlib
Derated library
Step 2
Circuit implementation and
signoff
circuit
Step 3
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
28ISVLSI-2014 invited talk 140710
Library Characterization for AVS
bull VBTI = Voltage for BTI aging estimation
bull Vlib = Voltage for circuit performance estimation (library characterization)
bull VBTI and Vlib are required in signoff
bull VBTI and Vlib depend on aging during AVS
bull Aging and Vfinal are unknowns before circuit implementation
Vlib
VBTI Derated library
|Vt| Circuit implementation and
signoff
circuitBTI degradation and AVS
Vfinal
Step 1 Step 2 Step 3
No obvious guideline to define VBTI and Vlib
Inconsistency among Vfinal Vlib VBTI
bull What is the design overhead when timing libraries are not properly characterized
bull Can we define BTI- and AVS-aware signoff corners that ensure product goals with small design lifetime energy overheads Joint work with Wei-Ting Jonas Chan Tuck-Boon Chan Siddhartha Nath
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
29ISVLSI-2014 invited talk 140710
Power vs Area Across Different Signoffs
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Large lifetime energy overhead
bull May fail to meet timing if desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
30ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
31ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0 bull Circuit activity (BTI aging)
bull BTI aging depends on circuit activitybull Assume DC or AC stress in derated library
characterization
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
32ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
33ISVLSI-2014 invited talk 140710
Proposed Library Characterization Flow
bull Heuristic obtain Vheur by averaging Vfinal of different cells
bull Heuristic use a ldquoflatrdquo Vheur to estimate BTI degradation
Obtain Vheur (average of standard cells)
Obtain derated library with VBTI = Vlib = Vheur
Signoff circuit with derated library
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
34ISVLSI-2014 invited talk 140710
Power vs Area for All Designs
bull 4 designs x DC AC x derating methods)
Proposed method
Circuit signed off usingother derated libraries
ldquoKneerdquo point for balanced area and power tradeoff
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate aging
bull Consume more powerbull May fail to meet timing if
desired supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
35ISVLSI-2014 invited talk 140710
bull Signoff mode = (voltage frequency) pair
bull Multi-mode operation requires multi-mode signoff
bull Example nominal mode and overdrive mode
bull Selection of signoff modes affects area power
bull ASP-DAC 2013 Optimization of signoff modes
Improve performance power or area
Reduce overdesign
NOM
ODNOM
OD
time
Vdd
tnom tOD tnom tOD
Also Multi-Mode Signoff Choices Matter
12
Fix fOD still 14 power range
Power of circuits w different overdrive modes
Different overdrive modes 26 power range
fnom = 800MHz Vnom = 08V
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
36ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
37ISVLSI-2014 invited talk 140710
Also Tunable Monitors Less Margin
Aggressive config Vmin_est lt Vmin_chip Some chips will fail
Default config
bull Low resistance passgates
bull Guardband for worst-case
bull Vmin_est gt Vmin_chip
bull 13mV margin
Optimized configbull Increase high
resistance passgatesbull Vmin_est asymp Vmin_chip
Benefits of tunability bull Compensate for difference
between model vs siliconbull Recover margin when variation is
reduced due to improved process
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
38ISVLSI-2014 invited talk 140710
Outlinebull Introductionbull Modeling of IC Variabilitybull Margining of IC Variabilitybull Tolerance of IC Variabilitybull Conclusions
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
39ISVLSI-2014 invited talk 140710
Conclusionsbull Variability severely challenges IC value
bull In manufacturing process during operation across lifetime
bull Benefit of ldquonext noderdquo is increasingly hard to findbull Entire node is a ldquo202020rdquo value propositionbull 5-10 in PPA metrics is now substantial at leading edge
bull Variability is connected to tapeout IC properties by models margins tolerances used in signoff
bull Some takeaways from this talkbull Substantial benefit from tightening BEOL corners (= signoff)bull ldquoMinimum cost of resiliencerdquo is a rich optimization challengebull Chicken-egg loops in signoff definition can be brokenbull Holistic approaches will provide ldquoequivalent scalingrdquo that
extends the value trajectory of Moorersquos Law
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
40ISVLSI-2014 invited talk 140710
Thank You
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
41ISVLSI-2014 invited talk 140710
Backup
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
42ISVLSI-2014 invited talk 140710
Power Penalty to Fix EM with AVS
1 2 3 4 5 6 7 8 91200
1300
1400
1500
1600
1700
030
032
034
036
Core Power (mW) PG Power (mW)
Implemetation
Core
Pow
er (m
W)
PG
Pow
er (m
W)
bull Core power increases due to elevated voltage bull PG power increases due to both elevated voltage and mesh degradationbull A tradeoff between invested guardband in signoff
Highest invested guardband
Least invested guardband
14 power penalty
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
43ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Interconnect stack with M1 and M2
M1 C
M2 C
3σ Pessimism
Example worst-case capacitance corner Homogeneous
Cw corner
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
44ISVLSI-2014 invited talk 140710
Homogeneous Cornersbull (1) Define RC corners of each layer separatelybull (2) Use corners from each layer to construct a
homogeneous corner for an interconnect stack
Interconnect stack with M1 and M2
M1 C
M2 C
3σ
Homogeneous Cw corner
C-3σ
Layer M2
3σ
C-3σ
Layer M1
3σ
Pessimism
Example worst-case capacitance corner
When variations in different layers are not fully correlated pessimism of homogeneous corners increase with layers
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
45ISVLSI-2014 invited talk 140710
Correlation Matrixbull Let Σ be the correlation matrix for variation sources
M1 M2 M3 M4
ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH ΔW ΔT ΔH
M1 ΔW 1 0 0 γ 0 0 γ 0 0 0 0 0
ΔT 0 1 0 0 γ 0 0 γ 0 0 0 0
ΔH 0 0 1 0 0 γ 0 0 γ 0 0 0
M2 ΔW γ 0 0 1 0 0 γ 0 0 0 0 0
ΔT 0 γ 0 0 1 0 0 γ 0 0 0 0
ΔH 0 0 γ 0 0 1 0 0 γ 0 0 0
M3 ΔW γ 0 0 γ 0 0 1 0 0 0 0 0
ΔT 0 γ 0 0 γ 0 0 1 0 0 0 0
ΔH 0 0 γ 0 0 γ 0 0 1 0 0 0
M4 ΔW 0 0 0 0 0 0 0 0 0 1 0 0
ΔT 0 0 0 0 0 0 0 0 0 0 1 0
ΔH 0 0 0 0 0 0 0 0 0 0 0 1
= Σ
Correlation for variation sources with the same variation type and in the process module γ 05
Variation sources in different process modules are independent
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
46ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths (2)
bull 92 of paths have lt 60 of wirelength on any single layer
Max wirelength ratio across all layers ()
Cum
ulati
ve p
roba
bilit
y
092
60
bull Variations in different layers are not fully correlated
bull Averaging uncorrelated variation smaller RC variation
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
47ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
48ISVLSI-2014 invited talk 140710
Delay Variation
α α
Δdelay at C-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Some paths have α gt 10 a CBC can underestimate delay variationsbull But these paths have larger delays at the other corner
C-worst corner underestimates delay variations but these paths are dominated by the RC-worst corner
α lt 10 delay variations are covered by the RC-worst corner
Dominated by C-worstΔdelay at C-worst gt Δdelay at RC-worst
Dominated by RC-worst Δdelay at RC-worst gt Δdelay at C-worst
Δdelay at RC-worst [d(Ycw) ndash d(Ytyp)] d(Ytyp)
bull Paths are more sensitive to R or to Cbull Using RC-worst or C-worst only will underestimate delay variationsbull Need both RC- and C-worst corners to cover process variationsbull In the following discussions α is defined at the dominant corner
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
49ISVLSI-2014 invited talk 140710
Non-Homogeneous Corner
bull Each layer can have different skewed variationsInterconnect stack with M1 and M2
M1 C
M2 C
3σ
Non-homogeneous cornerM1 == Cw (3σ)M2 == Ctyp
bull Less pessimism with non-homogeneous cornersbull Challenge
bull Many feasible combinationsbull A corner can only cover certain pathsbull How to choose the best combinations
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
50ISVLSI-2014 invited talk 140710
Opportunities for Tightened BEOL Corners
bull CBC can be pessimistic Most paths have α lt 05 bull Use tightened BEOL corners eg scale BEOL variation in
itf with α = 05
Δdj(Yrcw)dj(Ytyp) x 100
3σjd(Ytyp) x 100
Challenge how to avoid underestimating delay variation to preserve parametric yield
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
51ISVLSI-2014 invited talk 140710
Wiring Structure in Timing-Critical Paths
bull Critical paths are structurally similar
bull Wires on critical paths are routed on many layers
bull Structure is an outcome of the design flow
Testcasebull 45nm foundry library (wire
resistivity scaled by 8X)bull Netlist NETCARD 1mm2 570K
standard cell instancesbull 9 metal layersbull Extract critical paths from
different PVT and BEOL corners
Wirelength ratio ()
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
52ISVLSI-2014 invited talk 140710
Proposed Timing Signoff Flow
bull Extract RC at RC-worst C-worst and the typical corners
bull Calculate Δdelay of critical paths
bull Put path j in the group Gtbc if Δdelay is larger than a threshold
bull Fix only the paths in Gtbc using tightened BEOL corners
bull Since tightened corners have smaller delay variations timing closure is easier
Routed design
Timing analysis at BEOL corners Ytyp Ycw Yrcw
GTBC GCBC
ECOusing CBC
Timing analysis
using TBC
violation = 0
Timing analysis
using CBC
violation = 0
ECOusing TBC
done
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
53ISVLSI-2014 invited talk 140710
Experiment Setup
LEON3MP NETCARD SUPERBLUE12Clock period (ns) 18 20 31
Gate count 232K 575K 1031KUtilization () 84 79 82
Core area (mm2) 045 104 191Max transition (ps) 330 330 330
Testcases for validation (45nm library with 8X wire resistivity)
αCorrelation factor = 05
Acw () Arcw ()
TBC-05 05 43 73
TBC-06 06 33 50
TBC-07 07 30 34
Implement another NETCARD (clock period = 23ns) to obtain α Acw and Arcw
Statistical models (1) no correlation and (2) same kind of variation sources in the same process module have correlation factor = 05
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
54ISVLSI-2014 invited talk 140710
Further Analysis
bull Paths with small Δd(Yrcw) and Δd(Ycw) have large α
bull A path has small Δdelays the path is equally sensitive to R and C
bull Example dj = dj(Ytyp) + 05 ΔdR-M1 + 05 ΔdC-M1
bull For a given CBC = Ycw ΔdR-M1 is small but ΔdC-M1 is large delay variation of ΔdR-M1 and ΔdC-M1 are cancelled out Δd(Ycw) 0 lt σj
Nominal delay
Delay sensitivity to unit change in M1 resistance
Delay sensitivity to unit change in M1 capacitance
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
55ISVLSI-2014 invited talk 140710
Scaling Factor Results
LEON3MP
SUPERBLUE12NETCARD
α gt 05α gt 05
α gt 05
bull Similar trends in different designs
bull Large α when Δd(Yrcw)d(Ytyp) and Δd(Ycw)d(Ytyp) are small
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
56ISVLSI-2014 invited talk 140710
Benefits of Tightened BEOL Corners (1)
bull WNS and TNS are reduced by up to 120ps and 61ns
bull Timing violations reduces by 31 to 100
Correlation factor γ = 0 (variation sources are independent)
LEON SUPERBLUE NETCARD
-0180-0160-0140-0120-0100-0080-0060-0040-002000000020
CBC TBC-1 TBC-2
WN
S (n
s)
LEON SUPERBLUE NETCARD
-90-80-70-60-50-40-30-20-10
0CBC TBC-1 TBC-2
TNS
(ns)
LEON SUPERBLUE NETCARD0
200400600800
1000120014001600
CBC TBC-1 TBC-2
Tim
ing
viol
ation
s
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
57ISVLSI-2014 invited talk 140710
Heuristics 1
bull Model BTI degradation with Vfinal throughout lifetime
bull Aging of a flat Vfinal asymp aging of an adaptive Vdd
bull But slightly pessimistic
Vdd
time
NBTI
PBTI
VBTI = Vlib asymp Vfinal
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
58ISVLSI-2014 invited talk 140710
Vfinal Estimation
bull Problem Vfinal is not available at early design stage (design has not been implemented)
bull Vfinal = Vdd end of life (to compensate BTI aging)
bull Gates along critical pathbull Timing slack at t = 0
bull Circuit activity is not an issue bull Because BTI effect is not sensitive to circuit activitybull DC or AC stress model is sufficient
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
59ISVLSI-2014 invited talk 140710
Observation and Heuristic 2
bull Observation 2 Vfinal is not sensitive to gate types
bull Heuristic 2 use average Vfinal of different gate typesbull Vfinal is a function of timing slack
bull Assume timing slack = 0
10mV
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
60ISVLSI-2014 invited talk 140710
Technology and Benchmark Circuits
bull NANGATE library with 32nm PTM technology bull Signoff for setup time violationbull Temperature = 125Cbull Process corner = slow NMOS and PMOSbull BTI degradation = DC AC
Supply voltages
Circuit Frequency (GHz)C5315 138c7552 125AES 089MPEG2 105
Vmax105V
Vinit090V
Vheur1 (DC) 097V
Vheur1 (AC) 095V
Vheur2 (DC) 095V
Vheur2 (AC) 093V
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
61ISVLSI-2014 invited talk 140710
A Reference Signoff Flow
bull Basic idea keep a consistent VBTI VLIB and Vdd throughout circuit lifetime
bull Signoff flowbull Estimate aging at each time step
bull Update circuit timing and Vdd
bull Repeat until t = tfinal
bull Modify circuit and start over if Vfinal gt maximum allowed voltage
bull No overhead in timing analysis but very slow Many STA runs
and library
Vstep AVS voltage stepVfinal converged voltage
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
62ISVLSI-2014 invited talk 140710
Experiment Setupbull Characterize different derated libraries
bull Evaluate impact of library characterizationbull Seven setups
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner for AVS4 VBTI = Vfinal Do not overestimate aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
Case 1 2 3 4 5 6 7Vlib(V) Vinit Vinit Vmax Vinit NA Vheur1 Vheur2
VBTI (V) Vinit Vmax Vmax Vfinal NA Vheur1 Vheur2
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
63ISVLSI-2014 invited talk 140710
ldquoChicken and Eggrdquo Loop
bull ldquoChicken and eggrdquo loop in signoffbull Derated library characterization is related to BTI + AVSbull AVS affected by circuit implementation
bull Timing constraints critical paths etc
bull Circuit is affected by library characterization
Circuit
Derated Libraries
Vfinal
Vlib VBTI
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
64ISVLSI-2014 invited talk 140710
Bias Temperature Instability (BTI)
|ΔVth| increases when device is on (stressed)|ΔVth| is partially recovered when device is off (relaxed)NBTI PMOS PBTINMOS
|Vgs|
time
ON OFF ON OFF
[VattikondaWC06]
Device aging (|ΔVth|) accumulates over time
[TCASrsquo14]
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
65ISVLSI-2014 invited talk 140710
Observation 1
[Chan11]
bull BTI is a ldquofront-loadedrdquo phenomenon
bull 50 BTI aging happens within the 1st year of circuit lifetime (total lifetime = 10 years)
bull Most Vdd increment happens in early lifetime
bull Gap between Vdd and Vfinal reduces rapidly
asymp70 Vdd increment in 1 year(remaining 30 over 9 years)
Vfinal
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
66ISVLSI-2014 invited talk 140710
Results for DC Scenario
Optimistic signoff corner bull AVS increases supply voltage
aggressively to compensate agingbull Consume more powerbull May fail to meet timing if desired
supply voltage gt Vmax
Pessimistic signoff corner bull Ovestimate aging andor
underestimate circuit performance
bull Large area overhead
Good corners
1 VBTI = Vlib = Vinit Ignore AVS2 Most pessimistic derated library3 VBTI = Vlib = Vmax Extreme corner
for AVS4 Vbti = Vfinal Do not overestimate
aging but ignores AVS5 No derated library (reference)6 Proposed method with α=07 Proposed method with α=003
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
67ISVLSI-2014 invited talk 140710
Problem Signoff Corner Definition
bull Timing signoff ensure circuit meets performance target under PVT variations amp aging
bull Conventional signoff approach bull Analyze circuit timing at worst-case cornersbull Fix timing violations re-run timing analysis
bull With BTI aging and AVS what is the Vdd of the worst-cast corner for timing analysis
Vlib for circuit performance estimation
Min Vdd Max Vdd
VBTI for aging
estimation
MinVdd
Not applicable (Optimistic)
Max Vdd
Slowest circuitLess aging
Faster circuitWorst-case aging
Slowest circuit Worst-case aging
Too pessimistic
With BTI aging and AVS the worst-case voltage corner is not obvious
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
68ISVLSI-2014 invited talk 140710
AVS Signoff Corner Selection
10000 12000 14000 16000 18000 20000 2200020
22
24
26
28
30
32
44
4
888
7776
66
555
3
33
2
22
11
1
Non-EM Aware After Fixing (Mishra) After Fixing (Blacks)
Area (μm2)
Pow
er (m
W)
AES
Optimistic about AVS
Pessimistic about AVS
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
69ISVLSI-2014 invited talk 140710
AVS Impact on EM Lifetime
1 2 3 4 5 6 7 8 90
2
4
6
8
10
12
08
09
1
11
12Lifetime (year)
Implementation
Life
time
(yea
r)
Vfina
l (V)
Vfinal (V)
119872119879119879119865 (119894 )=119872119879119879119865 (119894minus1)times(119881 119863119863 (119894minus1 )119881 119863119863 (119894 ) )
2
bull Assume no EM fix at signoffbull BTI degradation is checked at each step and MTTF is updated as
30 MTTF penalty
200mV voltage compensation
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
70ISVLSI-2014 invited talk 140710
0 2 4 6 8 10 12090092094096098100102104
S1 S2 S3 S4 S5
Year
VDD
DMA 3S1 S2 S3 S4 S5
78
79
80
81
MTT
F (Y
ear)
EM Impact on AVS Scheduling
12 years MTTF penalty
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
71ISVLSI-2014 invited talk 140710
What is ldquoSignoffrdquo
bull Foundation of contract between design house and foundrybull ldquochip should workrdquo stack of models margins analysesbull Function timing signal integrity power integrity hellip
Nominal VddStatic IR drop
Power grid IR gradientDynamic IR
HCINBTI
Signoff Vdd
Voltage
Problem Margins = pessimism
overdesign schedule delay
ldquomargin stackrdquo for voltage signoff
Operating voltage
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
72ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (1)
bull Delay sensitivity of path pj to variation source zv
bull Assumptions bull Δdjv is linear with respect to variation sources
bull Variation sources are normal distributions
bull Obtain Δdjv using 28 runs of RC extraction and static timing analysis (STA)
28 itf files (27 variation
sources + Ytyp)
Routed Netlist
RC extraction
STA
Δdjv
Δdjv = [ - ] 3dj(Yv) dj(Ytyp)
Note Path delay includes gate and wire delays
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
73ISVLSI-2014 invited talk 140710
Statistical Timing Analysis (2)bull Σ is the correlation matrix for variation sources (eg 27 x 27)
bull Σ = λλT (Note λ is obtained by Cholesky decomposition)
Delay sensitivities with correlation
[Δdrsquoj1 hellip Δdrsquoj27] = [Δdj1 hellip Δdj27]λ
Standard deviation of path delay
σj = ((Δdrsquoj1)2 + hellip + (Δdrsquoj27)2)05
Note we use the delay variation from the statistical analysis as a reference
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
74ISVLSI-2014 invited talk 140710
Resilient Designs
bull Detect and recover from timing errors Ensure correct operation with dynamic variations (eg IR drop temperature fluctuation cross-coupling etc)
bull Trade off design robustness vs design quality Eg enable margin reduction
bull Improve performance (ie timing speculation)
084 088 092 096 10030
34
38
42
46
50
54
58
62conventional design
reilient Design
Supply voltage (V)
En
erg
y (
mJ
)
Conventional design Worst-case signoff No Vdd downscaling
Resilient design Typical-case signoff Vdd downscaling reduced energy
15 reduction
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
75ISVLSI-2014 invited talk 140710
Resilience Cost Reduction Problem
bull Given RTL design throughput requirement and error-tolerant registers
bull Objective implement design to minimize energy bull Estimation of design energy
119864119899119890119903119892119910=119875119900119908119890119903h h119879 119903119900119906119892 119901119906119905
h h119879 119903119900119906119892 119901119906119905=1minus119864119877119879
+1minus119864119877119903times119879
recovery cycles
Clock period
Error rate [Kahng10]
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
76ISVLSI-2014 invited talk 140710
Selective-Endpoint Optimizationbull Optimize fanin cone w tighter constraints Allows replacement of Razor FF w normal FFbull Trade off cost of resilience vs data path optimization
bull Question Which endpoint to be optimized
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
77ISVLSI-2014 invited talk 140710
Process-Aware Vdd Scaling (PVS)
Open-Loop AVS
Closed-Loop AVSP
ow
er
Freq amp Vdd LUT
Post-silicon characterization
AVS Pre-characterize LUT [Martin02]
Process-aware AVSPost-silicon characterization [Tschanz03]
Generic monitor
Design dependent replica
In-situmonitor
Process and temperature-aware AVS Generic on-chip monitor [Burd00]Design-dependent monitor [Elgebaly07 Drake08 Chan12]
In-situ performance monitor Measure actual critical paths [Hartman06 Fick10]
Error Detection System
Error detection and correction system Vdd scaling until error occurs [Das06Tschanz10]
Error Tolerance
AVS
approachesAVS classes
77
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
78ISVLSI-2014 invited talk 140710
Challenge Variability
1998 2000 2003 2006 2008 20111
10
MPU Release Date
Tran
sisto
r Cou
nt [M
]
Source [CPUDB]
DENSITY
IdealNon-ideality
2009
2010
2011
2012
2013
2014
2015
2016
2017
2018
2019
2020
2021
2022
2023
2024
0
100
200
300
400
500
600
700
0
02
04
06
08
1
12Dynamic Power (W)
POWER
Source [JeongK08]
IdealNon-ideality
2006 2008 2010 2012 2014 20161000
10000
100000
Extended Planar Bulk (μAμm)UTB FD (μAμm)DG (μAμm)Ideal Scaling
DRIVE CURRENT
Ideal
Source [ITRS]
Non-ideality
1995 2000 2005 2011 20160
05
1
15
2
25
3
MPU Release Date
Volt
SUPPLY VOLTAGE
Source [CPUDB]
Ideal
Non-ideality
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
79ISVLSI-2014 invited talk 140710
Energy Reduction in AVS Contextbull Adaptive voltage scaling allows lower supply voltage for resilient
designs thus reduced powerbull Proposed method trades off between timing-error penalty vs
reduced power at a lower supply voltagebull Proposed method achieves an average of 18 energy reduction
compared to pure-margin designs Resilience benefits increase in the context of AVS strategy
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
80ISVLSI-2014 invited talk 140710
Our Concept Mode Dominancebull Design cone (of mode A) is the union of all the feasible operating modes for
circuits signed of at mode Abull Design cone is determined by tradeoff between voltage and frequency (mainly
threshold voltages)bull One mode is outside of the design cone of the other
failed design overdesignbull Mode A has positive timing slacks with respect to mode B
mode A dominates mode Bbull Equivalent dominance no mode is dominated by the other
bull Modes are in each othersrsquo design cone
Voltage
Frequency
A
Negative Slacks = failed design
Positive Slacks = overdesign
B
C
Design Cone of mode A
Multi-mode signoff at modes which do not exhibit equivalent dominance leads to overdesign
Guideline search for signoff modes within design cone reduce overdesign
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
81ISVLSI-2014 invited talk 140710
Our Method Global Optimizationbull Iteratively sample and refine power models
bull Avoid circuit implementation at each modebull Small constant of runs is enough Scalable
Sample (SPampR)
Construct power models
Estimate optimal signoff modes
Sample (SPampR)
Refine power models
Adaptive search
Global optimization flow
09 10 11 1214
15
16
17
18
19
201st 2nd real
Signoff Voltage (v)
Po
wer
(m
W)
Power estimation of adaptive search
bull Ovals indicate sample pointsbull 1st 2nd power from power models at first
second iterationbull real power from real implemented circuits
Design AESf 700MHz
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
82ISVLSI-2014 invited talk 140710
Classes of Closed-Loop AVS
bull Critical path may be difficult to identify (IP from 3rd party)
bull Calibrating monitors at multiple modesvoltages requires long test time
Closed-Loop AVS
Design-dependent replica
In-situmonitor
Generic monitor
bull Does not capture design-specific performance variation
82
This work Tunable monitor for closed-loop AVSbull Can be applied as a generic monitorbull Or tuned to capture design-specific performance
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
83ISVLSI-2014 invited talk 140710
Design of RO with Tunable Vmin
bull Identified two circuit knobs to tune Vmin
bull Series resistancebull Cell types (INV NAND NOR)
bull Proposed circuitbull Different cell type covers different process cornersbull Tune series resistance of each stage to high or low
1 bit 1 bit 1 bit Control pins
High resistance
Low resistance
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
84ISVLSI-2014 invited talk 140710
Benefit of Resilience Cost Reductionbull Reference flows
bull Pure-margin (PM) conventional methodology w only margin insertionbull Brute-force (BF) insert error-tolerant FFs at timing-critical endpoints
bull Proposed method (CO) achieves up to 20 energy reduction compared to reference methods
bull Resilience benefits increase with safety margin
PM BF CO PM BF CO PM BF CO25
30
35
40
45
50
55 Energy penalty of throughput degradationEnergy penalty of additional circuitsEnergy wo resilience
En
erg
y (
mJ
)
Large marginMedium marginSmall margin
MUL
PM BF CO PM BF CO PM BF CO25
27
29
31
33
35
En
erg
y (
mJ
)
EXU
Large marginMedium marginSmall margin
Smallmediumlarge margin safety margin = 51015 of clock period
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
85ISVLSI-2014 invited talk 140710
Increased Benefit of Resilience With AVSbull AVS (Adaptive Voltage Scaling) allows lower supply voltage for
resilient designs reduced powerbull We trade off between timing-error penalty vs reduced power at a
lower supply voltagebull Average 18 energy reduction compared to pure-margin designs
Resilience benefits increase in AVS context
084 088 092 096 10030
36
42
48
54
60brute-forcepure-marginCombOpt
Supply voltage (V)
En
erg
y (
mJ
)
084 086 088 09 092 094 096 098 1 10225
29
33
37
41
45brute-force
pure-margin
CombOpt
Supply voltage (V)
En
erg
y (
mJ
)
MUL EXU
Minimum achievable energy
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt
86ISVLSI-2014 invited talk 140710
Overall Optimization Flow
bull Iteratively optimize with SEOpt and SkewOpt
Initial placement (all FFs = error-tolerant FFs)
Energy lt min energy
Save current solution
Margin insertion on K paths based on sensitivity function
Replace error-tolerant FFs w normal FFs
SEOpt
Activity-aware clock skew optimization
SkewOpt