Variation-Aware Design for Nanoscale VLSI
Sachin S. Sapatnekar
University of Minnesota
CAS-FEST 2010
Circuits and Systems Forum
on Emerging and Special Topics
1
The proliferation of computing
• Tablets
• Entertainment
• Smart grid
• Healthcare, automotive, security, …
More computing everywhere…
3
The incredibly shrinking transistor
4
Electronics, Vol. 38, No. 8, Apr 19, 1965
Cost of going to a new technology
[GLOBALFOUNDRIES] 5
So why bother?
• Because it makes economic sense… – R&D costs are rising, but so is revenue
6[GLOBALFOUNDRIES]
7
New technologies: 3D ICs
SOI wafers with bulk substrate removed
Adapted from [Das et al., ISVLSI, 2003] by B. Goplen
Generalized view
Bulk waferMetal level
of wafer 1Layer 1
Layer 2
Layer 3
Layer 4
Layer 5
Bulk Substrate
Detailed view
Inter-layer
bonds
Device
level 1
500m
10m
1m
Interlayer Via
Through-silicon Vias (TSVs)
New technologies: near-threshold computing
8[Dreslinski et al.]
Types of variations
• Based on the model– Systematic– Random– “Random”
• Based on the source– Process– Environmental– Design Uncertainty
• Based on the time when they are seen– One-time variations– Run-time variations
9
Process-related variations: examples
• Channel width variation: poly/ diff rounding, misalignment
• Gale oxide thickness
• Random dopant fluctuations
Poly
DiffusionSource: S. Tyagi
10
100
1000
10000
1000 500 250 130 65 32
Technology Node (nm)M
ean
Num
ber
of D
opan
t Ato
ms
UniformUniform
Non-uniformNon-uniformSource: S.
BorkarMany of these variations can be modeled by Gaussians 10
Source: Pey&Tung
A taxonomy of variations (contd.)
Within-Die (WID) Variations
SystematicSystematic
Die-to-Die (D2D) Variations
RandomRandom
[Intel]
Lot-to-LotDie-to-Die
Wafer-to-Wafer
11
The (f)law of averages [The drunk skater problem]
12
http://www.stanford.edu/~savage/flaw/
Why is this important?
• Because it affects circuit timing…
• … and power
13
Aging effects
• Circuit behaviour degrades with timeNBTI
Electromigration
Oxide breakdown
Hot carrier injection
14
SiH + h+ → Si+ + ½H2
Si HSi HSi H
H2
Substrate PolyGate Oxide
[S.
Sjø
thun
][S
uto
, Tera
dyne] Time
Failu
re
rate
7-15 years1-40 weeks
Normal lifetime
Constant failure rate
Based on TDDB, EM, hot-
electrons…
Environmental variations
• Supply voltage
• Soft errors
15
Source: Automotive 7-8, 2004
1
Environmental variations: Temperature
[Ch
u 1
99
9]
[Josh
i]
By Trubador, available at http://www.phys.ncku.edu.tw/~htsu/humor/fry_egg.
html
Fried egg
[In
tel]
[IB
M]
Technology trends
[Ch
u 1
99
9]
[Josh
i]
By Trubador, available at http://www.phys.ncku.edu.tw/~htsu/humor/fry_egg.
html
Fried egg
Temp(oC)
Core
Cache 70ºC
120ºC [In
tel]
[In
tel]
Adapted from [Das et al., ISVLSI, 2003]
Generalized view
Bulk waferMetal level
of wafer 1Tier 1
Tier 2
Tier 3
Tier 4
Tier 5
Bulk Substrate
Detailed view
Intertier
bonds
Device
level 1
500m
10m
1m
Intertier Via
SOI wafers with bulk substrate removed
Overcoming variations
• Three-pronged strategy– Reduce the fundamental sources– Don’t allow them to be expressed (design around the effects)– Mitigate the effects
• Example: temperature effects– Low-power design– Design to reduce T– Design to mitigate T-driven degradation
• All all levels of abstraction, using all available methods– Design– CAD– Architecture/OS– Algorithms
18
Adaptive sensing/mitigation
• Sense/adapt feedback loops
• Guardbanded presilicon fixes vs. adaptive postsilicon fixes
• Monitor cores
• Canary circuits– Silicon odometer
• Razor, CRISTA, etc.
19
LUT
Sensor
DC-DCConverter
FBBGenerator
vbp
vbn
Vdd
Cir
cuit
B
lock
Guardbanding
Sensor-driven
Core (4,2)
Core (3,2)
Core (3,1)
Core (4,1)
Core (4,4)
Core (3,4)
Core (3,3)
Core (4,3)
Core (2,2)
Core (1,2)
Core (1,1)
Core (2,1)
Core (2,4)
Core (1,4)
Core (1,3)
Core (2,3)
Phase Comp.
...
...
A
BC
PC_OUT (freq=fref - fstress)
[Kim, Minnesota]
CAS-FEST 2010
• A Resilience Roadmap– Sani Nassif, IBM
• Mitigating Variability in Near-Threshold Computing– Dennis Sylvester, Univ. of Michigan
• Robust System Design to Overcome CMOS Reliability Challenges – Subhasish Mitra, Stanford Univ.
• Computer Aided Circuit Design for Reliability in Nanometer CMOS– Georges Gielen, Katholieke Univ. – Leuven
• Process Compensated High Speed Ring Oscillators in Sub-Micron CMOS– Alyssa B. Apsel, Cornell U.
• Containing the Nanometer Pandora-box: Design Techniques for Variation-Aware Low-Power Systems– Abhijit Chatterjee, Georgia Tech.
20